video watermark application for embedding recipient id in real-time-encoding vod server
TRANSCRIPT
ORIGINAL RESEARCH PAPER
Video watermark application for embedding recipient IDin real-time-encoding VoD server
Takaaki Yamada • Michiro Maeta •
Fuminori Mizushima
Received: 17 September 2012 / Accepted: 28 February 2013
� Springer-Verlag Berlin Heidelberg 2013
Abstract A previously developed system for embedding
watermarks in video content in real time has been
improved by incorporating real-time transcoding, which
enables embedding of watermarks specific to the recipient.
That is, watermarks are used as fingerprints. When a play
command is received from a customer, the system decodes
the requested video content into frame images in the server.
The frame images are then watermarked, encoded, and
streamed to the customer in real time. Prototype testing
demonstrated that use of this watermarking method is
feasible for video-on-demand service; that is, up to 20
individually watermarked videos can be concurrently
streamed to customers. Visibility testing showed that the
quality of the watermarked images was practical to some
degree. Robustness testing showed that the embedded
watermarks were practically robust against encoding. Use
of this system should help deter the illegal copying and
distribution of video content.
Keywords Video watermark � Real-time processing �Fingerprinting � Copyright protection � Video-on-demand
1 Introduction
Digital video content can be easily distributed over the
Internet due to the widespread use of broadband networks
and highly efficient computers. Recent developments in
network services have led to the introduction of various
digital content distribution services such as video-on-
demand (VoD). However, the growing availability of video
content online is exacerbating the problem of copyright
violation. For instance, the various video sharing services
that provide a huge amount of video data worldwide is that
they enable illegal copies to be distributed anonymously
[1], making it difficult to prevent illegal user behavior.
This illegal copying and distribution of video content
damage the sales of the original content through commer-
cial services. One way to limit the extent of the damage is
to find the illegally distributed copies and delete them so
that they are not redistributed. The ability to identify illegal
copiers would help deter moral users from making and
distributing illegal copies. Moreover, the victimized
copyright holders could make compensation claims against
illegal copiers. Identifying illegal copier would thus help
reduce the damage. How to identify the illegal copiers is an
important issue for content distribution services, and vari-
ous approaches to this problem have been investigated,
including fingerprinting [2], broadcast encryption [3], and
traitor tracing [4] based on coding theory.
A well-known solution to the problem is to embed the
recipient’s ID information into the content in the form of a
digital watermark, that is, use watermarks as fingerprints
[5, 22]. This solution requires creating video watermarks
that are imperceptible to the human eye and robust against
image processing. Many algorithms have been developed
for general-purpose video watermarking in both the pixel
and frequency domains [6–10].
T. Yamada (&)
Yokohama Research Laboratory, Hitachi, Ltd.,
Yokohama, Japan
e-mail: [email protected]
M. Maeta � F. Mizushima
Hitachi Government & Public Corporation System Engineering,
Co., Ltd., Tokyo, Japan
e-mail: [email protected]
F. Mizushima
e-mail: [email protected]
123
J Real-Time Image Proc
DOI 10.1007/s11554-013-0335-4
The input and output (I/O) operation and complex
analysis of pixel values consumes much computing power,
meaning that those processes would impose a heavy load
on a thin client such as a user PC. We thus focus on the
server-application side, although embedding of watermarks
specific to the recipient can theoretically be done on either
the client or server. If the server application incorporates a
watermarking method, the method should work with the
encoder. There are three classes of watermark embedding
from the view point of system architecture [13, 23].
• Class I: Watermark embedding after video encoding.
• Class II: Watermark embedding incorporated within
encoding.
• Class III: Watermark embedding before video
encoding.
Class I methods use an architecture in which watermarks
are embedded in the compressed domain by altering the bit
stream. Early algorithms based on the modification of
discrete cosine transform (DCT) coefficients directly in the
MPEG bit stream can embed watermark quickly [9, 10].
More recently developed ones provide more robustness
against severe image processing [24, 25]. These methods
depend on the file format or video codec.
Class II methods use an architecture in which the
watermark embedding operation is efficiently implemented
in the transform process used by the encoder. They control
the compression process in the encoder so that the water-
marks can survive the process. For instance, robust water-
marks can be selectively embedded in intra-frame [23].
These methods depend on the encoder implementation.
Class III methods use an architecture in which the
watermarked video output is essentially the encoder input,
meaning that the watermarking algorithm outputs uncom-
pressed video [11, 12]. Because the method is independent
of the encoder, both the watermark embedding operation
and the encoder can be switched easily. This gives a VoD
service provider more flexibility in maintaining its server.
We thus adopted the architecture used for class III
methods.
Real-time processing is an essential requirement for a
VoD service incorporating watermarking. Video water-
marking systems applicable to the architecture used for
class III methods have been developed for real-time pro-
cessing. They use dedicated hardware [11] or a dedicated
parallel-computing platform [12]. Some watermarking
systems for high definition TV (HDTV) use a specific set of
hardware, such as an LSI [15], a field programmable gate
array (FPGA) [16], a media processor, and a graphics
processing unit (GPU) [12]. They embed watermarks in
HDTV content in real time and are applicable to single-
stream watermark embedding. Such real-time watermark
embedding is well-suited to applications in which video
content must be distributed with high efficiency, such as
broadcasting. However, if such systems are used in server
applications, the dedicated hardware increases equipment
cost and may cause compatibility problems. For instance, if
the recipient’s ID is to be embedded in each distributed
video file, a server exclusively for watermark embedding
must be implemented for each concurrent video stream.
These systems were designed for use on signal processing
equipment, and uncompressed video frames are used for
input and output signals. If a watermark-embedding system
based on either approach was implemented on a server, it
would use most of the computing resources as well as
monopolize the video I/O processes. They are therefore not
suited for server application. A service provider must be
able to support concurrent accesses for video streaming,
and implementing many servers dedicated to watermark
embedding is expensive from the viewpoints of investment
and operation. Unfortunately, there have been few pub-
lished reports of lightweight software implementations
with video I/O, as far as we know.
Another way to deal with this problem is to use a
practical software implementation based on a general-
purpose central processing unit (CPU) with video I/O [13,
14]. Such an implementation is independent of the dedi-
cated parallel-computing platform and is well-suited to
server applications.
We have improved a previously reported server-based
watermark-embedding system [13] by combining a CPU-
based method for real-time watermarking [14] with a real-
time transcoding method [17]. This enables a content dis-
tribution service to embed watermarks specific to the
recipient. Our previous work [20] on this issue has been
fully revised to evaluate this system. We describe the
problems with conventional methods in Sect. 2; how the
previous system has been improved in Sect. 3, and present
the results of prototype evaluation of the improved system
in Sect. 4.
2 Conventional methods for video watermarking
2.1 Target application
Consider a VoD system that distributes video content to
users. The content is copyrighted, so it needs to be pro-
tected. When an authenticated user requests a particular
video, the video server locates the appropriate file and
streams it to the customer. While illegal digital copying can
be prevented by applying proven content protection tech-
niques such as encryption and authentication, client ter-
minals such as PCs may be insecure. Because a common
PC is generally not tamper-resistant, an illegal copy of a
video file potentially can be made by hardware access.
J Real-Time Image Proc
123
To identify the illegal copier, a digital watermarking
technique can be used for embedding the recipient’s ID
into the video content, as illustrated in Fig. 1. If an illegally
copied video is found by chance, the embedded ID can be
used to identify the person who illegally copied it. Quickly
identifying the copier is therefore important for ensuring
reliability of the video service.
However, embedding watermarks into video content
without degrading quality is difficult because embedding
the recipient’s ID into the video file in real time would
consume much of the computing power of the server sys-
tem. Moreover, video watermarks should survive the video
encoding process before the decoded video is viewed on
the recipient’s terminal. Maintaining the image quality of
images with embedded watermarks and providing water-
mark robustness against image processing are essential
requirements for video watermarking.
2.2 Conventional systems for watermark embedding
A central processing unit (CPU)-based software imple-
mentation with video I/O has been developed for embed-
ding watermarks in uncompressed standard-definition TV
(SDTV) video in real time [14]. However, from the
viewpoint of system architecture, it is based on a stand-
alone model for embedding watermarks, so handling the
uncompressed video I/O consumes much of the computing
resources.
The system illustrated in Fig. 2 was previously devel-
oped for embedding video watermarks based on the client–
server model. It is a CPU-based software implementation
suitable for content distribution [13]. It enables real-time
processing including watermark embedding, MPEG
encoding, and hard disk drive recording. It is extensible
and can deliver watermarked video files to clients via
multicast streaming. However, it supports only the QVGA
(320 9 240-pixel) format, which is converted from the
VGA (640 9 480-pixel) format of the incoming video
signal. Moreover, the embedded information is invariable,
which conflicts with our target application—distributing
videos with the recipient’s ID embedded.
2.3 Video players and VoD systems
2.3.1 Conventional video player
To view a packaged video using a video player, such as a
digital versatile disc (DVD) player, the viewers inserts the
media into the player and presses the play button. The
player with an attached monitor and speakers then iterates
four steps:
(A) read encoded video data,
(B) decode the data,
(C) transmit the decoded data to the monitor and
speakers, and
(D) display/broadcast the decoded images and sound on
the monitor and speakers.
The architecture of a conventional video player is shown
in Fig. 3. It is, of course, difficult to deliver to each reci-
pient a packaged video with the recipient’s ID as an
identifiable watermark, which conflicts with our target
application.
The improved system uses network communication for
delivering video content between steps C and D. Individ-
ually watermarked video is generated by embedding digital
watermarks in real time before delivering it to the reci-
pient’s terminal.
2.3.2 Conventional VoD system
A conventional VoD system stores encoded videos. When
a play command is received from a customer, the system
simply streams the requested video to the customer’s ter-
minal, as shown in Fig. 4.
Class III methods use an architecture in which the
watermarked video output is the encoder input, as
Recipient
Video sharing serviceAuditor
Real-time encodingVideo-on-Demand
Real-time encodingVideo-on-Demand
Recipient’s ID
Watermarkdetection
Watermarkdetection
Embed recipient’s ID
Illegal
Video
copy?
Identify source of illegal copy
Fig. 1 Target application of digital watermarking
Video decoder
Video capturer
Watermarkembedder
Sound capturer
MPEG-4encoder
Signal input
Streamer
Client decoder
Conventional watermark embedding system
File output
Video stream
Sound stream
Embedding information(Ex. copyright holder’s name)
Fig. 2 Previously developed system for real-time watermark
embedding [13]
J Real-Time Image Proc
123
described in Sect. 1. A conventional VoD system incor-
porating robust watermarking methods is not well suited
for distributing a stream of video content containing
embedded watermarks identifying the recipient, etc.
because a service provider must prepare the video files
before distribution. That is, the embedded information,
such as name of copyright holder, is invariable.
The improved system uses the fast transcoding before
video content delivery for controlling quality of service
(QoS). A video stream with identifiable watermarks can be
generated in real time by using robust digital watermarking
and delivered to the customer’s terminal.
3 Real-time video watermarking and encoding VoD
system
3.1 Description
Our improved video watermarking system for real-time-
encoding VoD service creates a video stream with identi-
fiable watermarks generated in real time by using robust
digital watermarking. A prototype of the improved system
created watermarked video streams with DVD resolution.
The quality of service (QoS) is maintained by using fast
transcoding before video content delivery. By combining a
CPU-based method for real-time watermarking [14] and a
QoS-aware method for real-time encoding [17] in the video
server, we have created an improved system that performs
fingerprinting by using video watermarking. The use of fast
transcoding enables it to generate a watermarked video
stream as a new VoD service.
Typical specifications for the service’s video streaming
are summarized in Table 1. The specifications for the
prototype development environment are summarized in
Table 2.
3.2 System architecture
The architecture of the improved system is shown in Fig. 5.
It has four iterative steps A0–D0, similar to those of
a physical video player. It uses Internet delivery for step
C0-3. That is, when a play command is received (the ini-
tialization process is executed in advance), the system
synchronously iterates steps A0–D0 in real time by syn-
chronizing the server and client processes.
Play commannd
Video & audioEncoded
video
Player (Decoder)
(Step A) read encoded video data(Step B) decode the data(Step C) transmit the decoded data
(Step D) display the decoded images on a monitor
Fig. 3 Architecture of conventional video player
StreamerCustomerterminal
(PC)
Video & audio
Playcommannd
Encodedvideo VoD server
Fig. 4 Architecture of conventional VoD system
Table 1 Typical service specifications for video streaming with
improved system
Codec MPEG2
Size 848 9 480 (DVD resolution)
Sound Two-channel
Download speed Constant bit rate (CBR) 5 Mbps (megabits
per second)
Averaged frame rate 30 fps (frames per second)
Averaged latency No more than 33 ms
Number of concurrent
accesses
20
Note that they are adjustable
Table 2 Prototype development environment (blade server used)
CPU type Xeona 2.26 GHz (quad core)
Number of CPUs Two
Main memory 24 GB
Network 1G-bit Ethernet
Parallel processing
framework
A proven multithreading technique is used
with DirectXb for audio and video
transcoding, which are done in parallel. The
video watermarking is a single process done
with a video encoder.
a Xeon is a registered trademark of Intel Corporation in the US and
other countriesb DirectX is a registered trademark of Microsoft Corporation in the
US and other countries
Real-time encoding VoDsystem
EncodedVideo
EncodedVideo
EncodedVideo...
Watermarkembedder Encoder
Video & audio
Playcommand
Streamer
(Step A') (Step B')
Player(Decoder)
(Step C'-4)(Step C'-5)
Customerterminal(PC)
(Step C'-1) (Step C'-2) (Step C'-3)(Step D')
Fig. 5 Architecture of improved system. Recipient’s ID can be
embedded in real time
J Real-Time Image Proc
123
[Server process]
(A0) read encoded video data.
(B0) decode the data.
(C0-1) embed watermarks into the decoded data if the
watermark option is selected.
(C0-2) re-encode the watermarked data into a video
stream.
(C0-3) deliver the video stream to the network (using
proven encryption and authentication methods);
[Client process]
(C0-4) receive the video stream at the terminal from
which the command was issued.
(C0-5) decode the re-encoded video in the stream using
player software on the user terminal, and.
(D0) display the video on the terminal’s screen.
If a video frame is not ready to be output in step D0, the
previous frame is reused instead. That is, a frame is
dropped. To prevent frame dropping, the frame data should
be synchronously decoded, watermarked, encoded, and
streamed in real time. Moreover, they should be transmit-
ted via the Internet and re-decoded. The customer can then
play the video content as if a video player was actually in
the customer’s terminal. This QoS-aware service is
accomplished by changing the encoder setting in real time
based on network conditions.
The service image for end users is shown in Fig. 6. Users
can select videos from among genres such as movie, TV
drama, and anime. Users can select from a menu by pushing
a button (1–10) on a TV remote control with Internet sup-
port. This figure demonstrates that our improved video
watermarking system for real-time-encoding VoD service
can be implemented as a practical application.
3.3 Video watermarking
Our previously developed system, which uses a method
for speeding up the video watermarking process,
demonstrated high performance for embedding water-
marks in VGA-size streams [14]. It was originally
implemented in a stand-alone server, which consumed
much computing power for inputting and outputting
uncompressed video signals. To make the speed-up
method applicable to a server process for real-time
watermark embedding, we
– divided the watermark-embedding process into an
initialization pre-process common to every video frame
and watermark-embedding post-processes specific to
each frame,
– redesigned the data flow so that the post-processes
could reuse the watermark pattern output by the pre-
process, and
– redesigned the frame flow so that the video frames are
input from a decoder program and output to an encoder
program in server memory.
To simplify the description, we describe the process
flow of the watermark embedding in basic schema as
follows.
3.3.1 [Initialization]
The initialization process generates a watermark pattern
representing the set comprising a pseudo random array of
N elements with a value of �1; that is, m ¼ fmi ¼�1j1� i�Ng.
The generated watermark pattern is then stored in
server memory. It is generally possible to define a
watermark pattern so that the probability of false positive
errors occurring can be mathematically calculated.
Embedding watermarks by using such patterns, i.e., by
changing pixel values to meet these patterns, guarantees
the avoidance of false positives by using patterns with a
low probability.
3.3.2 [Watermark embedding]
In iterative steps A0–D0, as described in Sect. 3.2, the video
frames are continuously decoded into an uncompressed
format (such as YUV) one after another in server memory.
The frame data are generated in real time and kept in a
watermarking buffer. The process flow of the watermark
embedding comprises four steps. Note that this flow is in
the step C0-1. The following steps are done over
f ¼ 1; 2; . . ., where f is the frame number of the original
frame.
Step 1: (input): Read the original pixel value set of the
f th frame consisting of N pixels, yðf Þ ¼ yðf Þi j1� i�N
n o,
as an input parameter. The watermarked pixel value set
y0ðf Þ is an output parameter.
Fig. 6 Service image for end users using the metaphor of a video
rental shop
J Real-Time Image Proc
123
Step 2: (watermark strength calculation): Calculate
the set of watermark strengths sðf Þ ¼ sðf Þi [ 0j1� i�N
n o
from original frame yðf Þ. Each strength sðf Þi represents
watermark imperceptibility at pixel i of the f -th frame. A
method for maintaining image quality is used to calculate
the watermark strength. For instance, sðf Þi can be the dif-
ference between the pixel value for the ith pixel yðf Þi and the
average pixel value for the surrounding pixels. Setting the
watermark strength in this way maintains the image quality
by avoiding the embedding of a strong watermark in a plain
region (where the spatial frequency of the pixel values is
low) in each frame.
Step 3: (watermarked frame generation): Generate
watermarked frame y0 ðf Þ ¼ y
0 ðf Þi j1� i�N
n oby adding the
watermark pattern mi multiplied by the watermark strength:
y0 ðf Þi ¼ y
ðf Þi þ s
ðf Þi mi ð1Þ
Watermarked frame y0ðf Þ is then stored in server
memory. Note that this process reads the watermark
pattern micalculated in the initialization process; i.e., the
data is reused.
Step 4: (output): Send watermarked frame y0ðf Þ to the
next process (encoder) in real time.
The process flow of the watermark detection corre-
sponding to the basic schema comprises four steps.
3.3.3 [Watermark detection]
The same watermarks are embedded in consecutive frames
although the frame contents may change (e.g., the frames
show a person moving). Therefore, if these frames are
accumulated to form an average image, their watermark
signals will accumulate while their content will almost
disappear. We thus accumulate n consecutive frames to
generate an average image from which the watermark is
detected. The pixel value set of the f 0th watermarked
frame consisting of N pixels is y0ðf 0Þ ¼ y
0 ðf 0Þi j1� i�N
n o
ðf 0 ¼ 1; . . .; nÞ. The input parameters for the watermark
detection process are the watermarked frames y0ðf0Þ
and the
number of accumulated frames n. The number of pixels N
and pseudo random array m have the same values as used
in the watermark-embedding process.
Step 1: Do the following steps over n watermarked
frames y0ðf0Þ ðf 0 ¼ 1; . . .; nÞ.
Step 2: Accumulate the n frames in ~y ¼ f~yij1� i�Ng.~yi ¼ 1=n
Pnf¼1 y
0ðf 0Þi .
Step 3: Calculate correlation value c by correlating
pseudo random array m with accumulated frame ~y.
That is,
c ¼ 1
N
Xi
mi~yi ¼1
N
Xi
mi1
n
Xn
f 0¼1
y0ðf 0Þi
!
¼ 1
N
Xi
mi1
n
Xn
f 0¼1
yðf 0Þi þ s
ðf 0Þi mi
� � !
¼ 1
nN
Xi;f 0
miyðf 0Þi þ
Xi;f 0
sðf 0Þi
!ð2Þ
Note that the first term,Pi;f 0
miyðf 0Þi , should be near to zero
due to the randomness of m. Because watermark strength
sðf Þi is positive, the positive sign of the second term,
Pi;f 0
sðf 0Þi ,
determines the correlation value.
Step 4: Determine the existence of a watermark by
comparing correlation value c with threshold value
T [ 0ð Þ:
result ¼ detected if c� Tnot detected if else
�ð3Þ
The basic schema can be extended to a 1-bit-watermark
schema by defining another watermark pattern ~mindicating information bit 1, in addition to the definition
of m indicating information bit 0. In the multiple-bit
schema, each frame is divided into regions, and the 1-bit
process is applied to each region on the basis of the bit
value of the embedded information b. A watermark pattern
in the multiple-bit schema is the set comprising a pseudo
random array of N element, corresponding to the binary
values of the encoded embedded information. The
watermark pattern is repeatedly generated in a frame
image, providing much watermark robustness. The
embedded information b indicating recipient ID is an input
parameter for the initialization process. The watermark
pattern is the output parameter for the initialization
process. It is stored and reused in the watermark-
embedding process.
The watermarked video frames are encoded into the
video stream, which is then streamed to the recipient’s
terminal. Use of the speed-up method enables watermarks
to be embedded quickly so that iterative steps A’–D’ are
stably processed in real time.
4 System evaluation
4.1 Comparison with previous implementation
Our previous system for embedding watermarks into video
data in real time was implemented on a general-purpose
CPU [13]. In that implementation, the system comprised a
series of processes: capturing input video, embedding
J Real-Time Image Proc
123
watermarks into the video images, encoding the water-
marked images, and recording the video data on the hard
disk in the PC. It functioned as a video recorder based on a
stand-alone model. While watermarking and encoding can
be combined as a server application, if a decoder and
streamer are used, as shown in Fig. 2, the output video has
low resolution, e.g., QVGA resolution at 1 Mbps. This low
resolution means that its application is limited to video
streaming for mobile phones, narrowband PCs, and so on.
Moreover, it is inadequate for distributing individually
watermarked videos because the embedded information is
invariable. Although the previous implementation can
theoretically embed watermarks specific to each recipient
by operating the same number of watermark embedding
servers as the number of concurrent accesses, the same
number of additional hardware devices for decoding and
streaming must also be used, meaning that the implemen-
tation is prohibitively expensive.
Our improved system incorporates a video watermark-
ing method that can handle larger images, up to DVD size
(848 9 480). It also incorporates processes for decoding
input video files and streaming output video streams. It can
simultaneously distribute individually watermarked video
streams to up to 20 customers. Table 3 compares the two
implementations.
4.2 Measured performance time
4.2.1 [Watermark embedding]
The sequential server processes of decoding, watermark
embedding, encoding, and delivering video content should
be processed in real time. The prototype server has five
iterative steps, as described in Sect. 3.
(A0) read encoded video data.
(B0) decode the data.
(C0-1) embed watermarks into the decoded data.
(C0-2) re-encode the watermarked data into a video
stream.
(C0-3) deliver the video stream to the network.
The processing time for each step was measured while
the server was streaming video. The processing times for
steps A0 and B0 were measured together. Since the pro-
cessing time for each corresponding function was simply
measured by comparing the value returned by the clock to
the value returned by the initial call to the clock, each
processing time includes the synchronization time due to
parallel processing. The measured time was thus greater
than the elapsed real time. For instance, although audio
data is processed in parallel, the synchronization time
related to audio, such as de-multiplexing and multiplexing,
is included in the corresponding measured performance
time.
The processing time for video encoding varied greatly,
from 8 to 20 ms per frame, due to differences in frame type
and content (such as the motion property). In real-time
processing, each frame should be completely processed
within 33 ms on average because the video data is streamed
at 30 fps. As shown by the results in Table 4, the total
average time for the iterative steps ranged from 16 to 33 ms
per frame. That is, real-time processing was achieved.
Table 3 Comparison of previous and current implementation
Previous [13] Current
Network model Client–server model Client–server model
Service Originally recorded video that can be extended to video
distribution
Video on demand
Video-in of PC server Uncompressed video signal output by decoder Previously encoded files (DVD data)
Video-out of PC server No better than QVGA (320 9 240), MPEG4 at 1 Mbps DVD-size (848 9 480), MPEG2 at 5 Mbps
(adjustable)
Watermarked video
streams
Not considered. Additional hardware for streaming is required Considered
Concurrent access to
service
Multicast transmission 20 unicast transmissions (adjustable)
Embedded information Invariable (such as name of copyright holder) Variable (such as recipient’s ID)
Table 4 Performance times of server processes
Step Time (ms/
frame)
(A’) read encoded video data 4–6
(B’) decode the data
(C’-1) embed watermarks into the decoded data 2–3
(C’-2) re-encode the watermarked data into a video
stream
8–20
(C’-3) deliver the video stream to the network 2–4
Total processing time 16–33
J Real-Time Image Proc
123
If a process for embedding watermarks in a frame image
puts a heavy load on the server, the QoS-aware encoder
might drop a frame to maintain service. Although such
adjustments can be found in the system log, it is difficult to
estimate the effect on image quality at the client PC. If a
frame was dropped in the client PC due to real-time pro-
cessing problems in the server, the image quality of a
watermarked video might be degraded much more than that
of the original video. We therefore experimentally evalu-
ated the subjective image quality. As discussed in Sect. 4.4,
there was no significant image degradation.
4.2.2 [Watermark detection]
A watermarked encoded video file must be decoded into an
uncompressed format in advance to execute our prototype
application for watermark detection. After this decoding,
transferring the uncompressed image from disk into
memory is a lengthy process. Moreover, it is difficult to
accurately measure the transfer time independent of the
effects of disk buffering. Therefore, we measured only
the core time for watermark detection, which excludes the
transfer time.
The frame accumulation is the process of adding a frame
image to the accumulated image, as represented by Eq. (2).
This accumulation took 2 ms per frame on a desktop PC
(CPU: single core, 2.8 GHz). It took 66 ms to detect
watermarks in the accumulated image.
4.3 Measured performance
The image of the CPU usage monitor in Fig. 7a shows that
CPU usage remained stable at about 5 % for one access by
a user. Although the computational complexity of the
encoding process depends on the content, the peak load can
be controlled by using proven encoder techniques.
The image of the frame rate monitor in Fig. 7b shows
that the average frame rate remained stable at 30 fps. The
frame rate is the number of processed frames divided by
elapsed time. The periodic dips in the rate appeared only on
the displayed graph, meaning that display errors occurred.
They are attributed to the inadequacy of the measurement
method. The actual frame rate for the video stream
remained stable at 30 fps.
The image of the round-trip time (rtt) performance
monitor in Fig. 7c shows that rtt remained approximately
zero. The client rtt is the time it takes for a client to send a
request and the server to send a response over the Internet,
not including the time required for data transfer. Although
the server rtt varied once, the delay was no more than
33 ms. The variance is attributed to a frame-drop due to a
flicker in the Internet connection, not to watermarking. In
short, system operation was stably controlled.
The image of the frame image monitor in Fig. 7d shows
the current frame image. Although the video used was a
normal one, the image was intentionally obfuscated for
copyright reasons.
The CPU load with the prototype was not so high.
Moreover, quality of service was maintained from the
viewpoint of frame rate and round-trip time. It is thus
feasible for a VoD service to stream individually water-
marked videos concurrently.
4.4 Image quality in network environment
The algorithm used was previously shown to provide suf-
ficient image quality for VGA-size MPEG2 content at
8 Mbps [14]. It should also provide sufficient image quality
for DVD-size MPEG2 content at 5 Mbps, because the
current compression ratio (5 M/(848 9 480 9 3 9 8 9
30) = 1/59) is less than the previous one (8 M/(640 9
480 9 3 9 8 9 30) = 1/28), for 3 colors, 8 bits for each
Fig. 7 Screen-captured images of performance monitors
J Real-Time Image Proc
123
color value, and 30 fps. The lower the compression ratio of
a watermarked video, the more imperceptible the noise
caused by watermarking.
A 15-s sample video (showing people walking around in
a room) was used to test image quality. The sample video
was natively in encoded format. That is, the decoded
images were the original images for watermarking.
Watermarks were embedded in the original images when
the watermark option was selected. The original and
watermarked images were encoded into video streams at 5
Mbps that were then distributed through the Internet in
accordance with the specifications listed in Table 1. The
client PC was connected to the Internet using asymmetric
digital subscriber line (ADSL) technology. The commu-
nication speed was up to 10 Mbps for downloading on a
best effort basis. The streamed videos were evaluated at the
client PC.
The image qualities of the original and watermarked
videos in a network environment were evaluated subjec-
tively using a procedure based on Recommendation ITU-R
BT.500 [18]. The use of a network environment meant that
the image quality was comparable to that for an actual
implementation. The images were displayed on a monitor
in sequence and evaluated by ten participants who rated
watermark disturbance on a scale of 1–5, as shown in
Table 5. The average score for the watermarked video was
4.5.
The server maintains service quality by controlling the
encoding parameters in real time. If communication con-
ditions are temporarily degraded, it may generate block
noise, which would have been evident in both the original
and watermarked video content at the client PC. We did not
observe such noises during the experiment because the
Internet communication conditions were fortunately stable.
Evaluating the image quality of a video stream subjectively
under a great number of various conditions for various
network parameters such as the bit error ratio is difficult.
Moreover, such tests should include evaluation of image
quality in accordance with the encoder used, and such
testing is out of the scope of this paper. Our subjective
experimental results simply demonstrate the feasibility of
real-time processing without frame dropping, as described
in Sect. 4.2.
4.5 Image quality in local environment
Because it is difficult to reproduce experiments in a net-
work environment due to unstable communication through
the Internet, we manually embedded watermarks into video
files under the conditions shown in Table 1.
Three samples, as shown in Fig. 8, were used for image
quality evaluation in a local environment. They were
selected from the standard video set [19] and are the same
ones used previously [14]. The samples have various
motion properties. We made 848 9 480 images as original
images for watermarking by resampling HDTV images.
EntranceHall shows people walking around in a room as
captured with a fixed camera. WalkThroughTheSquare is a
dolly shot of people walking around, who are seen beyond
flowers near the camera. This sample is similar to the one
used in the network environment evaluation. The people in
the EntranceHall sample move more slowly than those in
the WalkThroughTheSquare sample. The objects captured
by a panning camera in the WhaleShow sample move
faster than those in the WalkThroughTheSquare sample.
That is, the three samples have a variety of motion prop-
erties, as including slow, middle-speed, and fast moving
objects.
The watermark algorithm use in the prototype system
was used. The implementation of the codec differed from
that in the prototype system. The watermarked videos were
encoded into files (MPEG2, 5 Mbps). The rated subjective
values for the encoded watermarked videos were respec-
tively (a) 4.7, (b) 4.7, and (c) 4.5, indicating that the image
quality of the watermarked videos was practical.
The peak signal noise ratios (PSNRs) calculated for the
three sample videos (EntranceHall, WalkThroughThe-
Square, WhaleShow) were respectively 42.2, 42.1, and
42.0 dB when uncompressed watermarked images and
uncompressed original images were compared (Table 6).
The MPEG2 encoder created noise in the encoded original
videos, resulting in PSNRs of 36.3–42.1, as shown in the
second row. The values for the encoded watermarked
videos were at almost the same level as those for the
encoded original videos, as shown in the third row. The
noise caused by the watermarking was thus somewhat
veiled by the encoder noise. In other words, the watermark
strength was properly set so that the watermarks were
almost imperceptible to the human eye. The same water-
mark strength was also used in the watermark robustness
test.
4.6 Watermark robustness against encoding
As our watermark-embedding method is implemented as
CPU-based software, the improved system has extensibil-
ity, meaning that the output of the watermark embedding
Table 5 Level of disturbance and rating scale
Disturbance Score
Imperceptible 5
Perceptible but not annoying 4
Slightly annoying 3
Annoying 2
Very annoying 1
J Real-Time Image Proc
123
function can be connected to another video encoder with a
lower compression rate. We therefore evaluated watermark
robustness against representative video codecs: MPEG2
and H.264 [MPEG4/AVC (Advanced Video Codec)]. The
bit rate was set to 1.0, 2.0, or 3.6 Mbps, and the frame rate
was set to 25 or 30 fps. These settings are commonly used
in commercial video distribution. The robustness test
conditions are summarized in Table 7. Twelve practical
sample videos such as computer graphics and landscape
ones were used to test watermark robustness. Each was
15 s long. The videos were decoded, watermark embedded,
and encoded in the same manner as for the image quality
test.
The detection was successful for all 12 samples indi-
cating that it is feasible to deliver watermarked video (user
identifiable video) through a VoD service.
Since the prototype system used the improved speed-up
method, its watermark robustness was coherently equal to
that of the base method [14]. Furthermore, to demonstrate
that the algorithm used is practical, we tested watermark
robustness against re-encoding, a common attack involving
severe image processing through which it is difficult for a
watermark to survive. The watermarked samples (Fig. 8)
(MPEG2, 5 Mbps) were re-encoded into smaller ones
(H.264/AVC, 1 Mbps) using an encoder different from that
in the prototype system. The watermarks in both re-enco-
ded samples were detected successfully.
4.7 Watermark detection ratio dependence on frame
accumulation
Watermark robustness was evaluated using the videos used
in the image quality evaluation. We considered watermark
detection to be successful when 64 bits of information
embedded in a video sample were correctly detected
without bit errors. We used the watermark detection ratio,
which is the ratio of the number of points where the
embedded 64 bits were correctly detected to the total
number of detection points. There were 450 frames in total
for each video. For instance, if the watermarks of 30
sequential frames were detected at a time (n = 30), there
were 15 detection points in each video.
One of the watermarked video stream samples was
encoded using H.264/AVC at 700 kbps. The number of
accumulated frames is n, as described in Sect. 3.3. The
detection ratios for n = 1, 5, 10, and 30 from 450 frames of
watermarked images are shown in Fig. 9. When n was 30,
the detection ratio for H.264/AVC video was 100 %. That is,
watermarks could be detected from 1-s video (30 frames) in
this experiment.
Fig. 8 Scenes from sample videos: a EntranceHall, b WalkThroughTheSquare, c WhaleShow
Table 6 Quantitative image quality (PSNR)
Image processing EntranceHall
(dB)
WalkThrough
TheSquare
(dB)
WhaleShow
(dB)
Not-encoded
watermarked video
42.2 42.1 42.0
Encoded original
video
42.1 37.9 36.3
Encoded watermarked
video
42.1 36.6 36.5
Table 7 Robustness test conditions
No. Size Codec Bit rate (Mbps) Frame rate (fps)
1 848 9 480 MPEG2 1.0 25
2 848 9 480 MPEG2 2.0 25
3 848 9 480 H.264/AVC 1.0 30
4 848 9 480 H.264/AVC 3.6 30
Number of accumulated frames (
Det
ecti
on r
atio
(%
)
0
20
40
60
80
100
1 5 10 30
n)
Fig. 9 Watermark detection ratios for n = 1, 5, 10, and 30 from 450
frames of watermarked images
J Real-Time Image Proc
123
4.8 Watermark robustness against channel noise
The QoS-aware encoder can dynamically change its
behavior in accordance with the measured communication
performance. The system can avoid continuous packet-loss
because the number of communication packets would be
decreased by highly compressing the video content.
Therefore, channel noise such as block noise rarely occurs.
Moreover, because Internet communication was stable
during our experiment, channel noise was not observed at
the client PC during the image quality test in a network
environment, as described in Sect. 4.4.
To evaluate watermark robustness against channel
noises, we embedded watermarks in the three sample
videos (Fig. 8). We encoded the sample videos under the
conditions shown in Table 1. We then decoded each video
and selected one intra-frame image (848 9 480). We
added 8 9 8-block noise to the image. The frame image
thus contained 6,340 blocks. Randomly selected block
locations in the frame image were painted in a monotone
pattern. The simulated channel noise ratio was calculated
as the ratio of the number of painted blocks divided by the
total number of blocks. The watermark bit detection ratio
was calculated as the ratio of the number of successfully
detected bits of information divided by the total number of
embedded bits of information.
Watermark detection was done against the watermarked
decoded frame image as a still picture with simulated
channel noise. Since the same watermarks were embedded
repeatedly, watermark detection was successful without bit
error when the channel noise ratio was no more than 10 %,
as shown in Fig. 10. That is, the watermarks were robust
against 10 % of channel noises in this simulation.
5 Conclusion
Real-time video watermark embedding is essential for a
VoD service. While our previous real-time watermark
embedding system, which uses software running on a PC,
is suitable for content distribution, it consumes much of the
computing power in handling the uncompressed video
stream.
In contrast, our improved system features real-time
transcoding and can embed watermarks containing each
recipient’s ID. That is, tracking of an illegal copier is
accomplished by embedding watermarks during server-side
processing. Real-time processing is achieved by decoding
the original video in the server, thereby eliminating the
need for capturing them in video signal, and by making the
watermark-pattern generation process common to every
frame a pre-process and, by then reusing the watermark
pattern output from this pre-process. With this improved
system, a VoD service can stream up to 20 user-specific
watermarked videos concurrently.
Visibility testing showed that the quality of the water-
marked images was practical to some degree. Robustness
testing showed that the embedded watermarks were prac-
tically robust against encoding. If illegally copied video is
found by chance, the illegal copier can be identified by
using these watermarks. Use of this system will thus help
deter illegal copying of video content.
Future work includes establishing watermark robustness
against collusion attacks. The current elemental watermark
payload of 64 bits is enough to identify a copier within a
specific time length of video content. Therefore, one way to
identify colluders is to embed redundant information for
traitor tracing in another timeline of the video.
Some parts of the developed system will be used in a
commercial service.
Acknowledgments We thank Broadmedia Corporation and
G-cluster Global Corporation for providing the video samples for
research purposes, implementing the prototype system, and support-
ing our experiments. Their development environment is the same as
that of ‘‘T’s TV’’ [21].
References
1. George, C., Scerri, J.: Web 2.0 and user-generated content: legal
challenges in the new frontier. J. Inf. Law Technol. 2 (2007)
2. Tardos, G.: Optimal probabilistic fingerprint codes. J. ACM
55(2), 1–24 (2008)
3. Boneh, D., Hamburg, M.: Generalized identity based and
broadcast encryption schemes. In: Proceedings of ASIACRYPT
2008, pp. 455–470 (2008)
4. Safavi-naini, R., Wang, Y.: Sequential traitor tracing. IEEE
Trans. Inf. Theory 49(5), 1319–1326 (2003)
5. Kirovski, D., Malvar, H., Yacobi, Y.: Multimedia content
screening using a dual watermarking and fingerprinting system.
In: Proceedings of ACM International Conference on Multi-
media, pp. 372–381 (2002)
6. Cox, I.J., Miller, M.L., Bloom, J.A.: Digital Watermarking.
Morgan Kaufmann Publishers, Burlington (2001)
7. Lancini, R., Mapelli, F., Tubaro, S.: A robust video watermarking
technique for compression and transcoding processing. In:
50
60
70
80
90
100
Wat
erm
ark
bit
det
ecti
on
rat
io (
%)
0 20 40 60 80
Simulated channel noise ratio (%)100
Fig. 10 Watermark bit detection ratio vs. simulated channel noise
ratio
J Real-Time Image Proc
123
Proceedings of IEEE International Conference on Multimedia &
Expo (ICME), vol. 1, pp. 549–552 (2002)
8. Hartung, F., Girod, B.: Watermarking of uncompressed and
compressed video. Signal Process. 66(3), 283–301 (1998)
9. Alattar, A.M., Lin, E.T., Celik, M.U.: Digital watermarking of
low bit-rate advanced simple profile MPEG-4 compressed video.
IEEE Trans. Circuits Syst. Video Technol. (CSVT) 13(8),
787–800 (2003)
10. Gopal, K., Latha, M.M.: Watermarking of digital video stream for
source authentication. Int. J. Comput. Sci. Issues 7(4), No 1,
18–25 (2010)
11. Jeong, Y.J., Kim, W.H., Moon, K.S., Kim, J.N.: Implementation
of watermark detection system for hardware based video water-
mark embedder. In: Proceedings of International Conference on
Convergence and Hybrid Information Technology, pp. 450–453
(2008)
12. Kim, K.S., Lee, H.Y., Im, D.H., Lee, H.K.: Practical, real-time,
and robust watermarking on the spatial domain for high-definition
video contents. IEICE Trans. Inf. Syst. E91-D(5), 1359–1368
(2008)
13. Yamada, T., Echizen, I., Tezuka, S., Yoshiura, H.: Real-time
digital video watermark embedding system based on software in
commodity PC. IEEE J. Electron. Inf. Syst. 127C(6), 897–903
(2007)
14. Yamada, T., Echizen, I., Yoshiura, H.: PC-based real-time video
watermark embedding system independent of platform for par-
allel computing. In: Shi, Y.Q. (ed.) Trans. on Data Hiding and
Multimedia Security VII, LNCS, vol. 7110, pp. 15–33. Springer,
Berlin (2012)
15. Y. J. Jeong, K. S. Moon, and J. N. Kim: ‘‘Implementation of real
time video watermark embedder based on Haar wavelet transform
using FPGA’’, in Proc. of Int’l Conf. on Future Generation
Communication and Networking Symposia (FGCNS2008),
pp. 63–66, 2008
16. Mathai, J.N., Sheikholeslami, A., Kundur, D.: VLSI Implemen-
tation of a real-time video watermark embedder and detector. In:
Proc. of IEEE Int’l Symposium on Circuits and Systems, vol. 2,
pp. 772–775 (2003)
17. Jarvinen, S., Laulajainen, J.-P., Sutinen, T., Sallinen, S.: QoS-
aware real-time video encoding. In: Proc. of IEEE Consumer
Communications and Networking Conference (CCNC2006),
vol. 2, pp. 994–997 (2006)
18. Rec. ITU-R BT.500-11: Methodology for the subjective assess-
ment of the quality of television pictures (2002)
19. The Institute of Image Information and Television Engineers
(ITE): HDTV standard video set (1993)
20. Yamada, T., Maeta, M., Mizushima, F.: Feasibility of a water-
mark application for real-time-encoding video-on-demand ser-
vice. In: Proc. of IEEE/SICE Int’l Symp. on System Integration
(SII2011), pp. 1131–1136 (2011)
21. VoD service, T’s TV: http://t-s.tv/ (Japanese)
22. Feng, H., Ling, H., Zou, F., Yan, W., Lu, Z.: A Collusion attack
optimization strategy for digital fingerprinting. ACM Trans.
Multimed. Comput. Commun. Appl. 8(S2), 36:01–36:20 (2012)
23. Meerwald, P., Uhl, A.: An efficient robust watermarking method
integrated in H.264/SVC. In: Shi,Y.Q. (ed.) Trans. on Data
Hiding and Multimedia Security VII, LNCS, vol. 7110, pp. 1–14.
Springer, Berlin (2012)
24. Wang, Y., Pearmain, A.: Blind MPEG-2 video watermarking
robust against geometric attacks: a set of approaches in DCT
domain. IEEE Trans. Image Process. 15(6), 1536–1543 (2006)
25. Chen, W.-M., Lai, C.-J., Wang, H.-C., Chao, H.-C., Lo, C.-H.:
H.264 video watermarking with secret image sharing. IET Image
Process. 5(4), 349–354 (2011)
Author Biographies
Takaaki Yamada received the B.S. degree of Engineering from
Kyoto University, and the Ph.D. degree of Information Science and
Technologies from Osaka University in 1988 and 2006, respectively.
He is a senior researcher at the Social Infrastructure Systems
Research Department, Yokohama Research Laboratory and has been
engaged in R&D in Hitachi, Ltd., Japan, since 1988. His research
interests include multimedia application and content distribution
system. He received the Best Paper Award from IPSJ in 2005, and the
Best Paper Award of IEEE IIHMSP in 2006; IEEE member.
Michiro Maeta received B.S. and M.S. degrees from the Kyushu
Institute of Technology, Japan, in 1999, 2001. In 2001, he joined
Hitachi Government & Public Corporation System Engineering, Ltd.,
Tokyo, Japan. He has been engaged video watermark application
systems.
Fuminori Mizushima received B.S. degrees from Ube National
College of Technology and M.S. degrees from the Kyushu Institute of
Technology, Japan, in 2004, 2006. In 2009, he joined Hitachi
Government & Public Corporation System Engineering, Ltd., Tokyo,
Japan. He has been engaged video watermark application systems.
J Real-Time Image Proc
123