video watermark application for embedding recipient id in real-time-encoding vod server

ORIGINAL RESEARCH PAPER

Video watermark application for embedding recipient IDin real-time-encoding VoD server

Takaaki Yamada • Michiro Maeta •

Fuminori Mizushima

Received: 17 September 2012 / Accepted: 28 February 2013

� Springer-Verlag Berlin Heidelberg 2013

Abstract A previously developed system for embedding

watermarks in video content in real time has been

improved by incorporating real-time transcoding, which

enables embedding of watermarks specific to the recipient.

That is, watermarks are used as fingerprints. When a play

command is received from a customer, the system decodes

the requested video content into frame images in the server.

The frame images are then watermarked, encoded, and

streamed to the customer in real time. Prototype testing

demonstrated that use of this watermarking method is

feasible for video-on-demand service; that is, up to 20

individually watermarked videos can be concurrently

streamed to customers. Visibility testing showed that the

quality of the watermarked images was practical to some

degree. Robustness testing showed that the embedded

watermarks were practically robust against encoding. Use

of this system should help deter the illegal copying and

distribution of video content.

Keywords Video watermark � Real-time processing �Fingerprinting � Copyright protection � Video-on-demand

1 Introduction

Digital video content can be easily distributed over the

Internet due to the widespread use of broadband networks

and highly efficient computers. Recent developments in

network services have led to the introduction of various

digital content distribution services such as video-on-

demand (VoD). However, the growing availability of video

content online is exacerbating the problem of copyright

violation. For instance, the various video sharing services

that provide a huge amount of video data worldwide is that

they enable illegal copies to be distributed anonymously

[1], making it difficult to prevent illegal user behavior.

This illegal copying and distribution of video content

damage the sales of the original content through commer-

cial services. One way to limit the extent of the damage is

to find the illegally distributed copies and delete them so

that they are not redistributed. The ability to identify illegal

copiers would help deter moral users from making and

distributing illegal copies. Moreover, the victimized

copyright holders could make compensation claims against

illegal copiers. Identifying illegal copier would thus help

reduce the damage. How to identify the illegal copiers is an

important issue for content distribution services, and vari-

ous approaches to this problem have been investigated,

including fingerprinting [2], broadcast encryption [3], and

traitor tracing [4] based on coding theory.

A well-known solution to the problem is to embed the

recipient’s ID information into the content in the form of a

digital watermark, that is, use watermarks as fingerprints

[5, 22]. This solution requires creating video watermarks

that are imperceptible to the human eye and robust against

image processing. Many algorithms have been developed

for general-purpose video watermarking in both the pixel

and frequency domains [6–10].

T. Yamada (&)

Yokohama Research Laboratory, Hitachi, Ltd.,

Yokohama, Japan

e-mail: [email protected]

M. Maeta � F. Mizushima

Hitachi Government & Public Corporation System Engineering,

Co., Ltd., Tokyo, Japan


F. Mizushima


123

J Real-Time Image Proc

DOI 10.1007/s11554-013-0335-4

The input and output (I/O) operation and complex

analysis of pixel values consumes much computing power,

meaning that those processes would impose a heavy load

on a thin client such as a user PC. We thus focus on the

server-application side, although embedding of watermarks

specific to the recipient can theoretically be done on either

the client or server. If the server application incorporates a

watermarking method, the method should work with the

encoder. There are three classes of watermark embedding

from the view point of system architecture [13, 23].

• Class I: Watermark embedding after video encoding.

• Class II: Watermark embedding incorporated within

encoding.

• Class III: Watermark embedding before video

encoding.

Class I methods use an architecture in which watermarks

are embedded in the compressed domain by altering the bit

stream. Early algorithms based on the modification of

discrete cosine transform (DCT) coefficients directly in the

MPEG bit stream can embed watermark quickly [9, 10].

More recently developed ones provide more robustness

against severe image processing [24, 25]. These methods

depend on the file format or video codec.

Class II methods use an architecture in which the

watermark embedding operation is efficiently implemented

in the transform process used by the encoder. They control

the compression process in the encoder so that the water-

marks can survive the process. For instance, robust water-

marks can be selectively embedded in intra-frame [23].

These methods depend on the encoder implementation.

Class III methods use an architecture in which the

watermarked video output is essentially the encoder input,

meaning that the watermarking algorithm outputs uncom-

pressed video [11, 12]. Because the method is independent

of the encoder, both the watermark embedding operation

and the encoder can be switched easily. This gives a VoD

service provider more flexibility in maintaining its server.

We thus adopted the architecture used for class III

methods.

Real-time processing is an essential requirement for a

VoD service incorporating watermarking. Video water-

marking systems applicable to the architecture used for

class III methods have been developed for real-time pro-

cessing. They use dedicated hardware [11] or a dedicated

parallel-computing platform [12]. Some watermarking

systems for high definition TV (HDTV) use a specific set of

hardware, such as an LSI [15], a field programmable gate

array (FPGA) [16], a media processor, and a graphics

processing unit (GPU) [12]. They embed watermarks in

HDTV content in real time and are applicable to single-

stream watermark embedding. Such real-time watermark

embedding is well-suited to applications in which video

content must be distributed with high efficiency, such as

broadcasting. However, if such systems are used in server

applications, the dedicated hardware increases equipment

cost and may cause compatibility problems. For instance, if

the recipient’s ID is to be embedded in each distributed

video file, a server exclusively for watermark embedding

must be implemented for each concurrent video stream.

These systems were designed for use on signal processing

equipment, and uncompressed video frames are used for

input and output signals. If a watermark-embedding system

based on either approach was implemented on a server, it

would use most of the computing resources as well as

monopolize the video I/O processes. They are therefore not

suited for server application. A service provider must be

able to support concurrent accesses for video streaming,

and implementing many servers dedicated to watermark

embedding is expensive from the viewpoints of investment

and operation. Unfortunately, there have been few pub-

lished reports of lightweight software implementations

with video I/O, as far as we know.

Another way to deal with this problem is to use a

practical software implementation based on a general-

purpose central processing unit (CPU) with video I/O [13,

14]. Such an implementation is independent of the dedi-

cated parallel-computing platform and is well-suited to

server applications.

We have improved a previously reported server-based

watermark-embedding system [13] by combining a CPU-

based method for real-time watermarking [14] with a real-

time transcoding method [17]. This enables a content dis-

tribution service to embed watermarks specific to the

recipient. Our previous work [20] on this issue has been

fully revised to evaluate this system. We describe the

problems with conventional methods in Sect. 2; how the

previous system has been improved in Sect. 3, and present

the results of prototype evaluation of the improved system

in Sect. 4.

2 Conventional methods for video watermarking

2.1 Target application

Consider a VoD system that distributes video content to

users. The content is copyrighted, so it needs to be pro-

tected. When an authenticated user requests a particular

video, the video server locates the appropriate file and

streams it to the customer. While illegal digital copying can

be prevented by applying proven content protection tech-

niques such as encryption and authentication, client ter-

minals such as PCs may be insecure. Because a common

PC is generally not tamper-resistant, an illegal copy of a

video file potentially can be made by hardware access.


123

To identify the illegal copier, a digital watermarking

technique can be used for embedding the recipient’s ID

into the video content, as illustrated in Fig. 1. If an illegally

copied video is found by chance, the embedded ID can be

used to identify the person who illegally copied it. Quickly

identifying the copier is therefore important for ensuring

reliability of the video service.

However, embedding watermarks into video content

without degrading quality is difficult because embedding

the recipient’s ID into the video file in real time would

consume much of the computing power of the server sys-

tem. Moreover, video watermarks should survive the video

encoding process before the decoded video is viewed on

the recipient’s terminal. Maintaining the image quality of

images with embedded watermarks and providing water-

mark robustness against image processing are essential

requirements for video watermarking.

2.2 Conventional systems for watermark embedding

A central processing unit (CPU)-based software imple-

mentation with video I/O has been developed for embed-

ding watermarks in uncompressed standard-definition TV

(SDTV) video in real time [14]. However, from the

viewpoint of system architecture, it is based on a stand-

alone model for embedding watermarks, so handling the

uncompressed video I/O consumes much of the computing

resources.

The system illustrated in Fig. 2 was previously devel-

oped for embedding video watermarks based on the client–

server model. It is a CPU-based software implementation

suitable for content distribution [13]. It enables real-time

processing including watermark embedding, MPEG

encoding, and hard disk drive recording. It is extensible

and can deliver watermarked video files to clients via

multicast streaming. However, it supports only the QVGA

(320 9 240-pixel) format, which is converted from the

VGA (640 9 480-pixel) format of the incoming video

signal. Moreover, the embedded information is invariable,

which conflicts with our target application—distributing

videos with the recipient’s ID embedded.

2.3 Video players and VoD systems

2.3.1 Conventional video player

To view a packaged video using a video player, such as a

digital versatile disc (DVD) player, the viewers inserts the

media into the player and presses the play button. The

player with an attached monitor and speakers then iterates

four steps:

(A) read encoded video data,

(B) decode the data,

(C) transmit the decoded data to the monitor and

speakers, and

(D) display/broadcast the decoded images and sound on

the monitor and speakers.

The architecture of a conventional video player is shown

in Fig. 3. It is, of course, difficult to deliver to each reci-

pient a packaged video with the recipient’s ID as an

identifiable watermark, which conflicts with our target

application.

The improved system uses network communication for

delivering video content between steps C and D. Individ-

ually watermarked video is generated by embedding digital

watermarks in real time before delivering it to the reci-

pient’s terminal.

2.3.2 Conventional VoD system

A conventional VoD system stores encoded videos. When

a play command is received from a customer, the system

simply streams the requested video to the customer’s ter-

minal, as shown in Fig. 4.

Class III methods use an architecture in which the

watermarked video output is the encoder input, as

Recipient

Video sharing serviceAuditor

Real-time encodingVideo-on-Demand

Real-time encodingVideo-on-Demand

Recipient’s ID

Watermarkdetection

Watermarkdetection

Embed recipient’s ID

Illegal

Video

copy?

Identify source of illegal copy

Fig. 1 Target application of digital watermarking

Video decoder

Video capturer

Watermarkembedder

Sound capturer

MPEG-4encoder

Signal input

Streamer

Client decoder

Conventional watermark embedding system

File output

Video stream

Sound stream

Embedding information(Ex. copyright holder’s name)

Fig. 2 Previously developed system for real-time watermark

embedding [13]


123

described in Sect. 1. A conventional VoD system incor-

porating robust watermarking methods is not well suited

for distributing a stream of video content containing

embedded watermarks identifying the recipient, etc.

because a service provider must prepare the video files

before distribution. That is, the embedded information,

such as name of copyright holder, is invariable.

The improved system uses the fast transcoding before

video content delivery for controlling quality of service

(QoS). A video stream with identifiable watermarks can be

generated in real time by using robust digital watermarking

and delivered to the customer’s terminal.

3 Real-time video watermarking and encoding VoD

system

3.1 Description

Our improved video watermarking system for real-time-

encoding VoD service creates a video stream with identi-

fiable watermarks generated in real time by using robust

digital watermarking. A prototype of the improved system

created watermarked video streams with DVD resolution.

The quality of service (QoS) is maintained by using fast

transcoding before video content delivery. By combining a

CPU-based method for real-time watermarking [14] and a

QoS-aware method for real-time encoding [17] in the video

server, we have created an improved system that performs

fingerprinting by using video watermarking. The use of fast

transcoding enables it to generate a watermarked video

stream as a new VoD service.

Typical specifications for the service’s video streaming

are summarized in Table 1. The specifications for the

prototype development environment are summarized in

Table 2.

3.2 System architecture

The architecture of the improved system is shown in Fig. 5.

It has four iterative steps A0–D0, similar to those of

a physical video player. It uses Internet delivery for step

C0-3. That is, when a play command is received (the ini-

tialization process is executed in advance), the system

synchronously iterates steps A0–D0 in real time by syn-

chronizing the server and client processes.

Play commannd

Video & audioEncoded

video

Player (Decoder)

(Step A) read encoded video data(Step B) decode the data(Step C) transmit the decoded data

(Step D) display the decoded images on a monitor

Fig. 3 Architecture of conventional video player

StreamerCustomerterminal

(PC)

Video & audio

Playcommannd

Encodedvideo VoD server

Fig. 4 Architecture of conventional VoD system

Table 1 Typical service specifications for video streaming with

improved system

Codec MPEG2

Size 848 9 480 (DVD resolution)

Sound Two-channel

Download speed Constant bit rate (CBR) 5 Mbps (megabits

per second)

Averaged frame rate 30 fps (frames per second)

Averaged latency No more than 33 ms

Number of concurrent

accesses

20

Note that they are adjustable

Table 2 Prototype development environment (blade server used)

CPU type Xeona 2.26 GHz (quad core)

Number of CPUs Two

Main memory 24 GB

Network 1G-bit Ethernet

Parallel processing

framework

A proven multithreading technique is used

with DirectXb for audio and video

transcoding, which are done in parallel. The

video watermarking is a single process done

with a video encoder.

a Xeon is a registered trademark of Intel Corporation in the US and

other countriesb DirectX is a registered trademark of Microsoft Corporation in the

US and other countries

Real-time encoding VoDsystem

EncodedVideo

EncodedVideo

EncodedVideo...

Watermarkembedder Encoder

Video & audio

Playcommand

Streamer

(Step A') (Step B')

Player(Decoder)

(Step C'-4)(Step C'-5)

Customerterminal(PC)

(Step C'-1) (Step C'-2) (Step C'-3)(Step D')

Fig. 5 Architecture of improved system. Recipient’s ID can be

embedded in real time


123

[Server process]

(A0) read encoded video data.

(B0) decode the data.

(C0-1) embed watermarks into the decoded data if the

watermark option is selected.

(C0-2) re-encode the watermarked data into a video

stream.

(C0-3) deliver the video stream to the network (using

proven encryption and authentication methods);

[Client process]

(C0-4) receive the video stream at the terminal from

which the command was issued.

(C0-5) decode the re-encoded video in the stream using

player software on the user terminal, and.

(D0) display the video on the terminal’s screen.

If a video frame is not ready to be output in step D0, the

previous frame is reused instead. That is, a frame is

dropped. To prevent frame dropping, the frame data should

be synchronously decoded, watermarked, encoded, and

streamed in real time. Moreover, they should be transmit-

ted via the Internet and re-decoded. The customer can then

play the video content as if a video player was actually in

the customer’s terminal. This QoS-aware service is

accomplished by changing the encoder setting in real time

based on network conditions.

The service image for end users is shown in Fig. 6. Users

can select videos from among genres such as movie, TV

drama, and anime. Users can select from a menu by pushing

a button (1–10) on a TV remote control with Internet sup-

port. This figure demonstrates that our improved video

watermarking system for real-time-encoding VoD service

can be implemented as a practical application.

3.3 Video watermarking

Our previously developed system, which uses a method

for speeding up the video watermarking process,

demonstrated high performance for embedding water-

marks in VGA-size streams [14]. It was originally

implemented in a stand-alone server, which consumed

much computing power for inputting and outputting

uncompressed video signals. To make the speed-up

method applicable to a server process for real-time

watermark embedding, we

– divided the watermark-embedding process into an

initialization pre-process common to every video frame

and watermark-embedding post-processes specific to

each frame,

– redesigned the data flow so that the post-processes

could reuse the watermark pattern output by the pre-

process, and

– redesigned the frame flow so that the video frames are

input from a decoder program and output to an encoder

program in server memory.

To simplify the description, we describe the process

flow of the watermark embedding in basic schema as

follows.

3.3.1 [Initialization]

The initialization process generates a watermark pattern

representing the set comprising a pseudo random array of

N elements with a value of �1; that is, m ¼ fmi ¼�1j1� i�Ng.

The generated watermark pattern is then stored in

server memory. It is generally possible to define a

watermark pattern so that the probability of false positive

errors occurring can be mathematically calculated.

Embedding watermarks by using such patterns, i.e., by

changing pixel values to meet these patterns, guarantees

the avoidance of false positives by using patterns with a

low probability.

3.3.2 [Watermark embedding]

In iterative steps A0–D0, as described in Sect. 3.2, the video

frames are continuously decoded into an uncompressed

format (such as YUV) one after another in server memory.

The frame data are generated in real time and kept in a

watermarking buffer. The process flow of the watermark

embedding comprises four steps. Note that this flow is in

the step C0-1. The following steps are done over

f ¼ 1; 2; . . ., where f is the frame number of the original

frame.

Step 1: (input): Read the original pixel value set of the

f th frame consisting of N pixels, yðf Þ ¼ yðf Þi j1� i�N

n o,

as an input parameter. The watermarked pixel value set

y0ðf Þ is an output parameter.

Fig. 6 Service image for end users using the metaphor of a video

rental shop


123

Step 2: (watermark strength calculation): Calculate

the set of watermark strengths sðf Þ ¼ sðf Þi [ 0j1� i�N

n o

from original frame yðf Þ. Each strength sðf Þi represents

watermark imperceptibility at pixel i of the f -th frame. A

method for maintaining image quality is used to calculate

the watermark strength. For instance, sðf Þi can be the dif-

ference between the pixel value for the ith pixel yðf Þi and the

average pixel value for the surrounding pixels. Setting the

watermark strength in this way maintains the image quality

by avoiding the embedding of a strong watermark in a plain

region (where the spatial frequency of the pixel values is

low) in each frame.

Step 3: (watermarked frame generation): Generate

watermarked frame y0 ðf Þ ¼ y

0 ðf Þi j1� i�N

n oby adding the

watermark pattern mi multiplied by the watermark strength:

y0 ðf Þi ¼ y

ðf Þi þ s

ðf Þi mi ð1Þ

Watermarked frame y0ðf Þ is then stored in server

memory. Note that this process reads the watermark

pattern micalculated in the initialization process; i.e., the

data is reused.

Step 4: (output): Send watermarked frame y0ðf Þ to the

next process (encoder) in real time.

The process flow of the watermark detection corre-

sponding to the basic schema comprises four steps.

3.3.3 [Watermark detection]

The same watermarks are embedded in consecutive frames

although the frame contents may change (e.g., the frames

show a person moving). Therefore, if these frames are

accumulated to form an average image, their watermark

signals will accumulate while their content will almost

disappear. We thus accumulate n consecutive frames to

generate an average image from which the watermark is

detected. The pixel value set of the f 0th watermarked

frame consisting of N pixels is y0ðf 0Þ ¼ y

0 ðf 0Þi j1� i�N

n o

ðf 0 ¼ 1; . . .; nÞ. The input parameters for the watermark

detection process are the watermarked frames y0ðf0Þ

and the

number of accumulated frames n. The number of pixels N

and pseudo random array m have the same values as used

in the watermark-embedding process.

Step 1: Do the following steps over n watermarked

frames y0ðf0Þ ðf 0 ¼ 1; . . .; nÞ.

Step 2: Accumulate the n frames in ~y ¼ f~yij1� i�Ng.~yi ¼ 1=n

Pnf¼1 y

0ðf 0Þi .

Step 3: Calculate correlation value c by correlating

pseudo random array m with accumulated frame ~y.

That is,

c ¼ 1

N

Xi

mi~yi ¼1

N

Xi

mi1

n

Xn

f 0¼1

y0ðf 0Þi

!

¼ 1

N

Xi

mi1

n

Xn

f 0¼1

yðf 0Þi þ s

ðf 0Þi mi

� � !

¼ 1

nN

Xi;f 0

miyðf 0Þi þ

Xi;f 0

sðf 0Þi

!ð2Þ

Note that the first term,Pi;f 0

miyðf 0Þi , should be near to zero

due to the randomness of m. Because watermark strength

sðf Þi is positive, the positive sign of the second term,

Pi;f 0

sðf 0Þi ,

determines the correlation value.

Step 4: Determine the existence of a watermark by

comparing correlation value c with threshold value

T [ 0ð Þ:

result ¼ detected if c� Tnot detected if else

�ð3Þ

The basic schema can be extended to a 1-bit-watermark

schema by defining another watermark pattern ~mindicating information bit 1, in addition to the definition

of m indicating information bit 0. In the multiple-bit

schema, each frame is divided into regions, and the 1-bit

process is applied to each region on the basis of the bit

value of the embedded information b. A watermark pattern

in the multiple-bit schema is the set comprising a pseudo

random array of N element, corresponding to the binary

values of the encoded embedded information. The

watermark pattern is repeatedly generated in a frame

image, providing much watermark robustness. The

embedded information b indicating recipient ID is an input

parameter for the initialization process. The watermark

pattern is the output parameter for the initialization

process. It is stored and reused in the watermark-

embedding process.

The watermarked video frames are encoded into the

video stream, which is then streamed to the recipient’s

terminal. Use of the speed-up method enables watermarks

to be embedded quickly so that iterative steps A’–D’ are

stably processed in real time.

4 System evaluation

4.1 Comparison with previous implementation

Our previous system for embedding watermarks into video

data in real time was implemented on a general-purpose

CPU [13]. In that implementation, the system comprised a

series of processes: capturing input video, embedding


123

watermarks into the video images, encoding the water-

marked images, and recording the video data on the hard

disk in the PC. It functioned as a video recorder based on a

stand-alone model. While watermarking and encoding can

be combined as a server application, if a decoder and

streamer are used, as shown in Fig. 2, the output video has

low resolution, e.g., QVGA resolution at 1 Mbps. This low

resolution means that its application is limited to video

streaming for mobile phones, narrowband PCs, and so on.

Moreover, it is inadequate for distributing individually

watermarked videos because the embedded information is

invariable. Although the previous implementation can

theoretically embed watermarks specific to each recipient

by operating the same number of watermark embedding

servers as the number of concurrent accesses, the same

number of additional hardware devices for decoding and

streaming must also be used, meaning that the implemen-

tation is prohibitively expensive.

Our improved system incorporates a video watermark-

ing method that can handle larger images, up to DVD size

(848 9 480). It also incorporates processes for decoding

input video files and streaming output video streams. It can

simultaneously distribute individually watermarked video

streams to up to 20 customers. Table 3 compares the two

implementations.

4.2 Measured performance time

4.2.1 [Watermark embedding]

The sequential server processes of decoding, watermark

embedding, encoding, and delivering video content should

be processed in real time. The prototype server has five

iterative steps, as described in Sect. 3.

(A0) read encoded video data.

(B0) decode the data.

(C0-1) embed watermarks into the decoded data.

(C0-2) re-encode the watermarked data into a video

stream.

(C0-3) deliver the video stream to the network.

The processing time for each step was measured while

the server was streaming video. The processing times for

steps A0 and B0 were measured together. Since the pro-

cessing time for each corresponding function was simply

measured by comparing the value returned by the clock to

the value returned by the initial call to the clock, each

processing time includes the synchronization time due to

parallel processing. The measured time was thus greater

than the elapsed real time. For instance, although audio

data is processed in parallel, the synchronization time

related to audio, such as de-multiplexing and multiplexing,

is included in the corresponding measured performance

time.

The processing time for video encoding varied greatly,

from 8 to 20 ms per frame, due to differences in frame type

and content (such as the motion property). In real-time

processing, each frame should be completely processed

within 33 ms on average because the video data is streamed

at 30 fps. As shown by the results in Table 4, the total

average time for the iterative steps ranged from 16 to 33 ms

per frame. That is, real-time processing was achieved.

Table 3 Comparison of previous and current implementation

Previous [13] Current

Network model Client–server model Client–server model

Service Originally recorded video that can be extended to video

distribution

Video on demand

Video-in of PC server Uncompressed video signal output by decoder Previously encoded files (DVD data)

Video-out of PC server No better than QVGA (320 9 240), MPEG4 at 1 Mbps DVD-size (848 9 480), MPEG2 at 5 Mbps

(adjustable)

Watermarked video

streams

Not considered. Additional hardware for streaming is required Considered

Concurrent access to

service

Multicast transmission 20 unicast transmissions (adjustable)

Embedded information Invariable (such as name of copyright holder) Variable (such as recipient’s ID)

Table 4 Performance times of server processes

Step Time (ms/

frame)

(A’) read encoded video data 4–6

(B’) decode the data

(C’-1) embed watermarks into the decoded data 2–3

(C’-2) re-encode the watermarked data into a video

stream

8–20

(C’-3) deliver the video stream to the network 2–4

Total processing time 16–33


123

If a process for embedding watermarks in a frame image

puts a heavy load on the server, the QoS-aware encoder

might drop a frame to maintain service. Although such

adjustments can be found in the system log, it is difficult to

estimate the effect on image quality at the client PC. If a

frame was dropped in the client PC due to real-time pro-

cessing problems in the server, the image quality of a

watermarked video might be degraded much more than that

of the original video. We therefore experimentally evalu-

ated the subjective image quality. As discussed in Sect. 4.4,

there was no significant image degradation.

4.2.2 [Watermark detection]

A watermarked encoded video file must be decoded into an

uncompressed format in advance to execute our prototype

application for watermark detection. After this decoding,

transferring the uncompressed image from disk into

memory is a lengthy process. Moreover, it is difficult to

accurately measure the transfer time independent of the

effects of disk buffering. Therefore, we measured only

the core time for watermark detection, which excludes the

transfer time.

The frame accumulation is the process of adding a frame

image to the accumulated image, as represented by Eq. (2).

This accumulation took 2 ms per frame on a desktop PC

(CPU: single core, 2.8 GHz). It took 66 ms to detect

watermarks in the accumulated image.

4.3 Measured performance

The image of the CPU usage monitor in Fig. 7a shows that

CPU usage remained stable at about 5 % for one access by

a user. Although the computational complexity of the

encoding process depends on the content, the peak load can

be controlled by using proven encoder techniques.

The image of the frame rate monitor in Fig. 7b shows

that the average frame rate remained stable at 30 fps. The

frame rate is the number of processed frames divided by

elapsed time. The periodic dips in the rate appeared only on

the displayed graph, meaning that display errors occurred.

They are attributed to the inadequacy of the measurement

method. The actual frame rate for the video stream

remained stable at 30 fps.

The image of the round-trip time (rtt) performance

monitor in Fig. 7c shows that rtt remained approximately

zero. The client rtt is the time it takes for a client to send a

request and the server to send a response over the Internet,

not including the time required for data transfer. Although

the server rtt varied once, the delay was no more than

33 ms. The variance is attributed to a frame-drop due to a

flicker in the Internet connection, not to watermarking. In

short, system operation was stably controlled.

The image of the frame image monitor in Fig. 7d shows

the current frame image. Although the video used was a

normal one, the image was intentionally obfuscated for

copyright reasons.

The CPU load with the prototype was not so high.

Moreover, quality of service was maintained from the

viewpoint of frame rate and round-trip time. It is thus

feasible for a VoD service to stream individually water-

marked videos concurrently.

4.4 Image quality in network environment

The algorithm used was previously shown to provide suf-

ficient image quality for VGA-size MPEG2 content at

8 Mbps [14]. It should also provide sufficient image quality

for DVD-size MPEG2 content at 5 Mbps, because the

current compression ratio (5 M/(848 9 480 9 3 9 8 9

30) = 1/59) is less than the previous one (8 M/(640 9

480 9 3 9 8 9 30) = 1/28), for 3 colors, 8 bits for each

Fig. 7 Screen-captured images of performance monitors


123

color value, and 30 fps. The lower the compression ratio of

a watermarked video, the more imperceptible the noise

caused by watermarking.

A 15-s sample video (showing people walking around in

a room) was used to test image quality. The sample video

was natively in encoded format. That is, the decoded

images were the original images for watermarking.

Watermarks were embedded in the original images when

the watermark option was selected. The original and

watermarked images were encoded into video streams at 5

Mbps that were then distributed through the Internet in

accordance with the specifications listed in Table 1. The

client PC was connected to the Internet using asymmetric

digital subscriber line (ADSL) technology. The commu-

nication speed was up to 10 Mbps for downloading on a

best effort basis. The streamed videos were evaluated at the

client PC.

The image qualities of the original and watermarked

videos in a network environment were evaluated subjec-

tively using a procedure based on Recommendation ITU-R

BT.500 [18]. The use of a network environment meant that

the image quality was comparable to that for an actual

implementation. The images were displayed on a monitor

in sequence and evaluated by ten participants who rated

watermark disturbance on a scale of 1–5, as shown in

Table 5. The average score for the watermarked video was

4.5.

The server maintains service quality by controlling the

encoding parameters in real time. If communication con-

ditions are temporarily degraded, it may generate block

noise, which would have been evident in both the original

and watermarked video content at the client PC. We did not

observe such noises during the experiment because the

Internet communication conditions were fortunately stable.

Evaluating the image quality of a video stream subjectively

under a great number of various conditions for various

network parameters such as the bit error ratio is difficult.

Moreover, such tests should include evaluation of image

quality in accordance with the encoder used, and such

testing is out of the scope of this paper. Our subjective

experimental results simply demonstrate the feasibility of

real-time processing without frame dropping, as described

in Sect. 4.2.

4.5 Image quality in local environment

Because it is difficult to reproduce experiments in a net-

work environment due to unstable communication through

the Internet, we manually embedded watermarks into video

files under the conditions shown in Table 1.

Three samples, as shown in Fig. 8, were used for image

quality evaluation in a local environment. They were

selected from the standard video set [19] and are the same

ones used previously [14]. The samples have various

motion properties. We made 848 9 480 images as original

images for watermarking by resampling HDTV images.

EntranceHall shows people walking around in a room as

captured with a fixed camera. WalkThroughTheSquare is a

dolly shot of people walking around, who are seen beyond

flowers near the camera. This sample is similar to the one

used in the network environment evaluation. The people in

the EntranceHall sample move more slowly than those in

the WalkThroughTheSquare sample. The objects captured

by a panning camera in the WhaleShow sample move

faster than those in the WalkThroughTheSquare sample.

That is, the three samples have a variety of motion prop-

erties, as including slow, middle-speed, and fast moving

objects.

The watermark algorithm use in the prototype system

was used. The implementation of the codec differed from

that in the prototype system. The watermarked videos were

encoded into files (MPEG2, 5 Mbps). The rated subjective

values for the encoded watermarked videos were respec-

tively (a) 4.7, (b) 4.7, and (c) 4.5, indicating that the image

quality of the watermarked videos was practical.

The peak signal noise ratios (PSNRs) calculated for the

three sample videos (EntranceHall, WalkThroughThe-

Square, WhaleShow) were respectively 42.2, 42.1, and

42.0 dB when uncompressed watermarked images and

uncompressed original images were compared (Table 6).

The MPEG2 encoder created noise in the encoded original

videos, resulting in PSNRs of 36.3–42.1, as shown in the

second row. The values for the encoded watermarked

videos were at almost the same level as those for the

encoded original videos, as shown in the third row. The

noise caused by the watermarking was thus somewhat

veiled by the encoder noise. In other words, the watermark

strength was properly set so that the watermarks were

almost imperceptible to the human eye. The same water-

mark strength was also used in the watermark robustness

test.

4.6 Watermark robustness against encoding

As our watermark-embedding method is implemented as

CPU-based software, the improved system has extensibil-

ity, meaning that the output of the watermark embedding

Table 5 Level of disturbance and rating scale

Disturbance Score

Imperceptible 5

Perceptible but not annoying 4

Slightly annoying 3

Annoying 2

Very annoying 1


123

function can be connected to another video encoder with a

lower compression rate. We therefore evaluated watermark

robustness against representative video codecs: MPEG2

and H.264 [MPEG4/AVC (Advanced Video Codec)]. The

bit rate was set to 1.0, 2.0, or 3.6 Mbps, and the frame rate

was set to 25 or 30 fps. These settings are commonly used

in commercial video distribution. The robustness test

conditions are summarized in Table 7. Twelve practical

sample videos such as computer graphics and landscape

ones were used to test watermark robustness. Each was

15 s long. The videos were decoded, watermark embedded,

and encoded in the same manner as for the image quality

test.

The detection was successful for all 12 samples indi-

cating that it is feasible to deliver watermarked video (user

identifiable video) through a VoD service.

Since the prototype system used the improved speed-up

method, its watermark robustness was coherently equal to

that of the base method [14]. Furthermore, to demonstrate

that the algorithm used is practical, we tested watermark

robustness against re-encoding, a common attack involving

severe image processing through which it is difficult for a

watermark to survive. The watermarked samples (Fig. 8)

(MPEG2, 5 Mbps) were re-encoded into smaller ones

(H.264/AVC, 1 Mbps) using an encoder different from that

in the prototype system. The watermarks in both re-enco-

ded samples were detected successfully.

4.7 Watermark detection ratio dependence on frame

accumulation

Watermark robustness was evaluated using the videos used

in the image quality evaluation. We considered watermark

detection to be successful when 64 bits of information

embedded in a video sample were correctly detected

without bit errors. We used the watermark detection ratio,

which is the ratio of the number of points where the

embedded 64 bits were correctly detected to the total

number of detection points. There were 450 frames in total

for each video. For instance, if the watermarks of 30

sequential frames were detected at a time (n = 30), there

were 15 detection points in each video.

One of the watermarked video stream samples was

encoded using H.264/AVC at 700 kbps. The number of

accumulated frames is n, as described in Sect. 3.3. The

detection ratios for n = 1, 5, 10, and 30 from 450 frames of

watermarked images are shown in Fig. 9. When n was 30,

the detection ratio for H.264/AVC video was 100 %. That is,

watermarks could be detected from 1-s video (30 frames) in

this experiment.

Fig. 8 Scenes from sample videos: a EntranceHall, b WalkThroughTheSquare, c WhaleShow

Table 6 Quantitative image quality (PSNR)

Image processing EntranceHall

(dB)

WalkThrough

TheSquare

(dB)

WhaleShow

(dB)

Not-encoded

watermarked video

42.2 42.1 42.0

Encoded original

video

42.1 37.9 36.3

Encoded watermarked

video

42.1 36.6 36.5

Table 7 Robustness test conditions

No. Size Codec Bit rate (Mbps) Frame rate (fps)

1 848 9 480 MPEG2 1.0 25

2 848 9 480 MPEG2 2.0 25

3 848 9 480 H.264/AVC 1.0 30

4 848 9 480 H.264/AVC 3.6 30

Number of accumulated frames (

Det

ecti

on r

atio

(%

)

0

20

40

60

80

100

1 5 10 30

n)

Fig. 9 Watermark detection ratios for n = 1, 5, 10, and 30 from 450

frames of watermarked images


123

4.8 Watermark robustness against channel noise

The QoS-aware encoder can dynamically change its

behavior in accordance with the measured communication

performance. The system can avoid continuous packet-loss

because the number of communication packets would be

decreased by highly compressing the video content.

Therefore, channel noise such as block noise rarely occurs.

Moreover, because Internet communication was stable

during our experiment, channel noise was not observed at

the client PC during the image quality test in a network

environment, as described in Sect. 4.4.

To evaluate watermark robustness against channel

noises, we embedded watermarks in the three sample

videos (Fig. 8). We encoded the sample videos under the

conditions shown in Table 1. We then decoded each video

and selected one intra-frame image (848 9 480). We

added 8 9 8-block noise to the image. The frame image

thus contained 6,340 blocks. Randomly selected block

locations in the frame image were painted in a monotone

pattern. The simulated channel noise ratio was calculated

as the ratio of the number of painted blocks divided by the

total number of blocks. The watermark bit detection ratio

was calculated as the ratio of the number of successfully

detected bits of information divided by the total number of

embedded bits of information.

Watermark detection was done against the watermarked

decoded frame image as a still picture with simulated

channel noise. Since the same watermarks were embedded

repeatedly, watermark detection was successful without bit

error when the channel noise ratio was no more than 10 %,

as shown in Fig. 10. That is, the watermarks were robust

against 10 % of channel noises in this simulation.

5 Conclusion

Real-time video watermark embedding is essential for a

VoD service. While our previous real-time watermark

embedding system, which uses software running on a PC,

is suitable for content distribution, it consumes much of the

computing power in handling the uncompressed video

stream.

In contrast, our improved system features real-time

transcoding and can embed watermarks containing each

recipient’s ID. That is, tracking of an illegal copier is

accomplished by embedding watermarks during server-side

processing. Real-time processing is achieved by decoding

the original video in the server, thereby eliminating the

need for capturing them in video signal, and by making the

watermark-pattern generation process common to every

frame a pre-process and, by then reusing the watermark

pattern output from this pre-process. With this improved

system, a VoD service can stream up to 20 user-specific

watermarked videos concurrently.

Visibility testing showed that the quality of the water-

marked images was practical to some degree. Robustness

testing showed that the embedded watermarks were prac-

tically robust against encoding. If illegally copied video is

found by chance, the illegal copier can be identified by

using these watermarks. Use of this system will thus help

deter illegal copying of video content.

Future work includes establishing watermark robustness

against collusion attacks. The current elemental watermark

payload of 64 bits is enough to identify a copier within a

specific time length of video content. Therefore, one way to

identify colluders is to embed redundant information for

traitor tracing in another timeline of the video.

Some parts of the developed system will be used in a

commercial service.

Acknowledgments We thank Broadmedia Corporation and

G-cluster Global Corporation for providing the video samples for

research purposes, implementing the prototype system, and support-

ing our experiments. Their development environment is the same as

that of ‘‘T’s TV’’ [21].

References

1. George, C., Scerri, J.: Web 2.0 and user-generated content: legal

challenges in the new frontier. J. Inf. Law Technol. 2 (2007)

2. Tardos, G.: Optimal probabilistic fingerprint codes. J. ACM

55(2), 1–24 (2008)

3. Boneh, D., Hamburg, M.: Generalized identity based and

broadcast encryption schemes. In: Proceedings of ASIACRYPT

2008, pp. 455–470 (2008)

4. Safavi-naini, R., Wang, Y.: Sequential traitor tracing. IEEE

Trans. Inf. Theory 49(5), 1319–1326 (2003)

5. Kirovski, D., Malvar, H., Yacobi, Y.: Multimedia content

screening using a dual watermarking and fingerprinting system.

In: Proceedings of ACM International Conference on Multi-

media, pp. 372–381 (2002)

6. Cox, I.J., Miller, M.L., Bloom, J.A.: Digital Watermarking.

Morgan Kaufmann Publishers, Burlington (2001)

7. Lancini, R., Mapelli, F., Tubaro, S.: A robust video watermarking

technique for compression and transcoding processing. In:

50

60

70

80

90

100

Wat

erm

ark

bit

det

ecti

on

rat

io (

%)

0 20 40 60 80

Simulated channel noise ratio (%)100

Fig. 10 Watermark bit detection ratio vs. simulated channel noise

ratio


123

Proceedings of IEEE International Conference on Multimedia &

Expo (ICME), vol. 1, pp. 549–552 (2002)

8. Hartung, F., Girod, B.: Watermarking of uncompressed and

compressed video. Signal Process. 66(3), 283–301 (1998)

9. Alattar, A.M., Lin, E.T., Celik, M.U.: Digital watermarking of

low bit-rate advanced simple profile MPEG-4 compressed video.

IEEE Trans. Circuits Syst. Video Technol. (CSVT) 13(8),

787–800 (2003)

10. Gopal, K., Latha, M.M.: Watermarking of digital video stream for

source authentication. Int. J. Comput. Sci. Issues 7(4), No 1,

18–25 (2010)

11. Jeong, Y.J., Kim, W.H., Moon, K.S., Kim, J.N.: Implementation

of watermark detection system for hardware based video water-

mark embedder. In: Proceedings of International Conference on

Convergence and Hybrid Information Technology, pp. 450–453

(2008)

12. Kim, K.S., Lee, H.Y., Im, D.H., Lee, H.K.: Practical, real-time,

and robust watermarking on the spatial domain for high-definition

video contents. IEICE Trans. Inf. Syst. E91-D(5), 1359–1368

(2008)

13. Yamada, T., Echizen, I., Tezuka, S., Yoshiura, H.: Real-time

digital video watermark embedding system based on software in

commodity PC. IEEE J. Electron. Inf. Syst. 127C(6), 897–903

(2007)

14. Yamada, T., Echizen, I., Yoshiura, H.: PC-based real-time video

watermark embedding system independent of platform for par-

allel computing. In: Shi, Y.Q. (ed.) Trans. on Data Hiding and

Multimedia Security VII, LNCS, vol. 7110, pp. 15–33. Springer,

Berlin (2012)

15. Y. J. Jeong, K. S. Moon, and J. N. Kim: ‘‘Implementation of real

time video watermark embedder based on Haar wavelet transform

using FPGA’’, in Proc. of Int’l Conf. on Future Generation

Communication and Networking Symposia (FGCNS2008),

pp. 63–66, 2008

16. Mathai, J.N., Sheikholeslami, A., Kundur, D.: VLSI Implemen-

tation of a real-time video watermark embedder and detector. In:

Proc. of IEEE Int’l Symposium on Circuits and Systems, vol. 2,

pp. 772–775 (2003)

17. Jarvinen, S., Laulajainen, J.-P., Sutinen, T., Sallinen, S.: QoS-

aware real-time video encoding. In: Proc. of IEEE Consumer

Communications and Networking Conference (CCNC2006),

vol. 2, pp. 994–997 (2006)

18. Rec. ITU-R BT.500-11: Methodology for the subjective assess-

ment of the quality of television pictures (2002)

19. The Institute of Image Information and Television Engineers

(ITE): HDTV standard video set (1993)

20. Yamada, T., Maeta, M., Mizushima, F.: Feasibility of a water-

mark application for real-time-encoding video-on-demand ser-

vice. In: Proc. of IEEE/SICE Int’l Symp. on System Integration

(SII2011), pp. 1131–1136 (2011)

21. VoD service, T’s TV: http://t-s.tv/ (Japanese)

22. Feng, H., Ling, H., Zou, F., Yan, W., Lu, Z.: A Collusion attack

optimization strategy for digital fingerprinting. ACM Trans.

Multimed. Comput. Commun. Appl. 8(S2), 36:01–36:20 (2012)

23. Meerwald, P., Uhl, A.: An efficient robust watermarking method

integrated in H.264/SVC. In: Shi,Y.Q. (ed.) Trans. on Data

Hiding and Multimedia Security VII, LNCS, vol. 7110, pp. 1–14.

Springer, Berlin (2012)

24. Wang, Y., Pearmain, A.: Blind MPEG-2 video watermarking

robust against geometric attacks: a set of approaches in DCT

domain. IEEE Trans. Image Process. 15(6), 1536–1543 (2006)

25. Chen, W.-M., Lai, C.-J., Wang, H.-C., Chao, H.-C., Lo, C.-H.:

H.264 video watermarking with secret image sharing. IET Image

Process. 5(4), 349–354 (2011)

Author Biographies

Takaaki Yamada received the B.S. degree of Engineering from

Kyoto University, and the Ph.D. degree of Information Science and

Technologies from Osaka University in 1988 and 2006, respectively.

He is a senior researcher at the Social Infrastructure Systems

Research Department, Yokohama Research Laboratory and has been

engaged in R&D in Hitachi, Ltd., Japan, since 1988. His research

interests include multimedia application and content distribution

system. He received the Best Paper Award from IPSJ in 2005, and the

Best Paper Award of IEEE IIHMSP in 2006; IEEE member.

Michiro Maeta received B.S. and M.S. degrees from the Kyushu

Institute of Technology, Japan, in 1999, 2001. In 2001, he joined

Hitachi Government & Public Corporation System Engineering, Ltd.,

Tokyo, Japan. He has been engaged video watermark application

systems.

Fuminori Mizushima received B.S. degrees from Ube National

College of Technology and M.S. degrees from the Kyushu Institute of

Technology, Japan, in 2004, 2006. In 2009, he joined Hitachi

Government & Public Corporation System Engineering, Ltd., Tokyo,

Japan. He has been engaged video watermark application systems.


123

http://t-s.tv/

video watermark application for embedding recipient id in real-time-encoding vod server

Documents