controlling the inaudibility and maximizing the robustness in an audio annotation watermarking...

1772 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 5, SEPTEMBER 2006

Controlling the Inaudibility and Maximizingthe Robustness in an Audio Annotation

Watermarking SystemCléo Baras, Nicolas Moreau, Associate Member, IEEE, and Przemyslaw Dymarski

Abstract—This paper presents the complete design of an audiodata hiding system destined to transmit a binary information viathe audio communication channel for audio annotation applica-tions. The proposed system is based on an innovative embeddingstrategy. It consists of 1) a new inaudibility control procedurethat locally regulates the watermark transparency, 2) an informedembedding function that maximizes system robustness to additivechannel perturbation by using a new criterion of robustness – thatis by maintaining the error probability at a fixed value, and 3) anefficient and low computational cost synchronization mechanism.System performance in terms of inaudibility of the watermark,transmission reliability with respect to various perturbations, andcomputational cost is evaluated on real audio signals to determinethe efficiency of the proposed embedding strategy.

Index Terms—Audio, data hiding, informed strategy, watermarkinaudibility, watermarking.

I. INTRODUCTION

AUDIO data hiding [1]–[3] groups techniques that aimat embedding a binary information in an audio signal

without introducing perceptual degradation. Well-known ap-plications of audio data hiding, namely watermarking andsteganography, are related to copyright protection, privacy, andsecrecy. In the last decade, a new application field, namelyannotation watermarking, is emerging on the fringe of the pre-vious two. It consists of using the audio signal as a transmissionchannel. The embedded information can be a content descriptorsuch as the song title or the name of the artist to ease indexing, alabel for monitoring applications that permits to track the audiosignal in a broadcast network, or more generally any meta-datauseful for added value services.

The latter application field is illustrated by a governmentalproject, thanks to which the system proposed in this paper wasfinanced. This application scenario aims at developing a servicefor persons with hearing impairment that increases their com-prehension of TV programs by controlling a speaking face syn-thesis device. A bit rate of several hundreds of bits per secondis then targeted with the following constraints: inaudibility ofthe embedded signal, real-time receiver operation (since the wa-

Manuscript received July 28, 2005; revised May 5, 2006. The associate editorcoordinating the review of this manuscript and approving it for publication wasDr. Michael Davies.

C. Baras and N. Moreau are with the Department of Signal and Image Pro-cessing, Ecole Nationale Supérieure des Télécommunications (ENST), 75634Cédex 13, France (e-mail : [email protected]; [email protected]).

P. Dymarski is with the Institute of Telecommunications, Warsaw Universityof Technology, 00-665 Warsaw, Poland (e-mail: [email protected]).

Digital Object Identifier 10.1109/TASL.2006.879808

termark embedding may be performed at the post-productionstage), robustness to broadcast perturbations (MPEG compres-sion, transcoding or format modification, noise addition anddesynchronization). A synchronization mechanism is requiredto locate the embedded information on the time axis, due to asimple delay between the embedder and the receiver, but alsoto take into account that sampling frequencies at the transmitterand the receiver may differ, due to the use of different audiodevices or a deliberate time-axis scaling of the watermarked ex-cerpt to satisfy time broadcast constraint.

This application is only an example—this paper addresses alarger problem, that is proposing a concept of an efficient audiodata hiding system for annotation applications. This efficiency isevaluated by the following performance criteria: the inaudibilityof the embedded information, the transmission rate (as high aspossible), the transmission reliability measured by the bit errorrate (as low as possible), the robustness to a large set of pertur-bations and the low computational cost. These perturbations areall the licit operations that can be applied on an audio signal andyield a very small degradation of the perceptual audio quality.Regarding the perturbations variety [4], we focus our study ona nonexhaustive subset of licit perturbations, included by theAudioStirmark evaluation tool, and time-stretching, one of themost troublesome perturbations.

The design of an audio watermarking system is strongly re-lated to the choice of an appropriate embedding functionthat hides the information in the audio signal . This choicehas to conciliate the inaudibility constraint and the robustnessof the transmission of the embedded information [5].

To reach inaudibility, embedded signal power is generallycontrolled using a psychoacoustical model applied to the audiosignal. This model consists of using a scaling factor [6] orshaping the embedded signal according a masking threshold[7], [8], which makes the system dependent on psychoacous-tical model parameters.

When pirate attacks are excluded, studies about system ro-bustness concern two groups of distortions: the perturbationsthat can be modeled as additive channel noise and the desyn-chronization perturbations.

For additive channel noise, the so-called informed embed-ding strategies [9], [10] have been designed applying Shannon’s[11] and Costa’s [12] works and have already proved their ef-ficiency over noninformed strategies. These strategies use thea priori knowledge of the audio signal during the embeddingprocess to choose an appropriate watermark signal that concil-iates the inaudibility constraint and the robust transmission of

1558-7916/$20.00 © 2006 IEEE

https://www.researchgate.net/publication/3894992_StirMark_benchmark_audio_watermarking_attacks?el=1_x_8&enrichId=rgreq-275badcf-b0b5-4a01-a382-3e0ed436ee0c&enrichSource=Y292ZXJQYWdlOzM0NTc2Mzk7QVM6MTM5MDk3MDI4OTYwMjU3QDE0MTAxNzQ3MjQ3MDY=

https://www.researchgate.net/publication/224105240_Channels_with_Side_Information_at_the_Transmitter?el=1_x_8&enrichId=rgreq-275badcf-b0b5-4a01-a382-3e0ed436ee0c&enrichSource=Y292ZXJQYWdlOzM0NTc2Mzk7QVM6MTM5MDk3MDI4OTYwMjU3QDE0MTAxNzQ3MjQ3MDY=

https://www.researchgate.net/publication/228870495_Audio_Watermarking_and_Fingerprinting_For_Which_Applications?el=1_x_8&enrichId=rgreq-275badcf-b0b5-4a01-a382-3e0ed436ee0c&enrichSource=Y292ZXJQYWdlOzM0NTc2Mzk7QVM6MTM5MDk3MDI4OTYwMjU3QDE0MTAxNzQ3MjQ3MDY=

https://www.researchgate.net/publication/3639217_Digital_watermarks_for_audio_signals?el=1_x_8&enrichId=rgreq-275badcf-b0b5-4a01-a382-3e0ed436ee0c&enrichSource=Y292ZXJQYWdlOzM0NTc2Mzk7QVM6MTM5MDk3MDI4OTYwMjU3QDE0MTAxNzQ3MjQ3MDY=

https://www.researchgate.net/publication/3083840_Writing_on_dirty_paper_Corresp?el=1_x_8&enrichId=rgreq-275badcf-b0b5-4a01-a382-3e0ed436ee0c&enrichSource=Y292ZXJQYWdlOzM0NTc2Mzk7QVM6MTM5MDk3MDI4OTYwMjU3QDE0MTAxNzQ3MjQ3MDY=

https://www.researchgate.net/publication/3927275_What_can_we_reasonably_expect_from_watermarks?el=1_x_8&enrichId=rgreq-275badcf-b0b5-4a01-a382-3e0ed436ee0c&enrichSource=Y292ZXJQYWdlOzM0NTc2Mzk7QVM6MTM5MDk3MDI4OTYwMjU3QDE0MTAxNzQ3MjQ3MDY=

https://www.researchgate.net/publication/3080405_Wornell_GW_Quantization_Index_Modulation_A_Class_of_Provably_Good_Methods_for_Digital_Watermarking_and_Information_Embedding_IEEE_Transactions_on_Information_Theory_474_1423-1443?el=1_x_8&enrichId=rgreq-275badcf-b0b5-4a01-a382-3e0ed436ee0c&enrichSource=Y292ZXJQYWdlOzM0NTc2Mzk7QVM6MTM5MDk3MDI4OTYwMjU3QDE0MTAxNzQ3MjQ3MDY=

https://www.researchgate.net/publication/2532789_Informed_Embedding_Exploiting_Image_and_Detector_Information_during_Watermark_Insertion?el=1_x_8&enrichId=rgreq-275badcf-b0b5-4a01-a382-3e0ed436ee0c&enrichSource=Y292ZXJQYWdlOzM0NTc2Mzk7QVM6MTM5MDk3MDI4OTYwMjU3QDE0MTAxNzQ3MjQ3MDY=

https://www.researchgate.net/publication/2985570_Watermarking_as_communications_with_side_information?el=1_x_8&enrichId=rgreq-275badcf-b0b5-4a01-a382-3e0ed436ee0c&enrichSource=Y292ZXJQYWdlOzM0NTc2Mzk7QVM6MTM5MDk3MDI4OTYwMjU3QDE0MTAxNzQ3MjQ3MDY=

https://www.researchgate.net/publication/256994444_Robust_Audio_Watermarking_Using_Perceptual_Masking_Signal_Processing?el=1_x_8&enrichId=rgreq-275badcf-b0b5-4a01-a382-3e0ed436ee0c&enrichSource=Y292ZXJQYWdlOzM0NTc2Mzk7QVM6MTM5MDk3MDI4OTYwMjU3QDE0MTAxNzQ3MjQ3MDY=

https://www.researchgate.net/publication/3908763_Spread_spectrum_signaling_for_speech_watermarking?el=1_x_8&enrichId=rgreq-275badcf-b0b5-4a01-a382-3e0ed436ee0c&enrichSource=Y292ZXJQYWdlOzM0NTc2Mzk7QVM6MTM5MDk3MDI4OTYwMjU3QDE0MTAxNzQ3MjQ3MDY=

https://www.researchgate.net/publication/281167978_Quantization_index_modulation_A_class_of_provably_good_methods_for_digital_watermarking_and_information_embedding?el=1_x_8&enrichId=rgreq-275badcf-b0b5-4a01-a382-3e0ed436ee0c&enrichSource=Y292ZXJQYWdlOzM0NTc2Mzk7QVM6MTM5MDk3MDI4OTYwMjU3QDE0MTAxNzQ3MjQ3MDY=

BARAS et al.: CONTROLLING THE INAUDIBILITY AND MAXIMIZING THE ROBUSTNESS 1773

Fig. 1. Additive audio data hiding system.

the embedded information. Few informed embedding strategiesspecifically dedicated to audio signals have been detailed in theliterature. Malvar’s spread-spectrum-based system [13] uses astrategy for binary information that limits average audible dis-tortion and minimizes the probability of making an error duringthe reception process. Nevertheless, this strategy did not pre-vent the introduction of local audible distortions. Siebenhaar in[14] exploits Eggers’s scalar Costa scheme [15] and controlsthe inaudibility of the embedded information using a maskingthreshold. Nevertheless, its transmission rate is variable, whichcan be a disadvantage for broadcast applications.

Because of the synchronous type of watermark transmission,a synchronization mechanism has to be added to the system.Most synchronization mechanisms proposed in the literatureare adapted to spread-spectrum embedding strategies. De C. T.Gomes [16] proposes to embed synchronization patterns knownat the receiver. His system is efficient only for low wow level.Kirovski’s system [17] is robust to high wow level, by usingsome stretched versions of the synchronization patterns, but re-quires a high computational cost to take into account desynchro-nization attacks performed by a pirate.

This paper aims at presenting the complete design of a newdata hiding system and its performances with respect to se-lected perturbations, characteristic for a broadcast transmissionwithout pirate attacks. Moreover, the proposed system benefitsfrom a new informed embedding strategy that conciliates theinaudibility constraint and a robust detection of the embeddedinformation. This strategy exploits the following:

• an innovative inaudibility control, based on a psychoa-coustical model with adaptive parameters. (these param-eters are adjusted with respect to the audio signal and anobjective evaluation of the perceptual distortion betweenthe original and the watermarked signal computed usingthe perceptual evaluation of audio quality (PEAQ) algo-rithm [18]);

• an informed embedding function which maximizessystem robustness to additive channel perturbation andstill maintains the error probability at a fixed value;

• an efficient and low computational cost synchronizationmechanism adapted to our application context.

The outline of the paper is as follows. In Section II, the designof a basic noninformed data hiding system is recalled. The in-audibility control procedure is then detailed in Section III. Sec-tion IV explains how the basic system is modified to become in-formed and how to choose the watermark that maximizes systemrobustness to additive channel perturbations. A synchronization

mechanism is finally introduced to establish system robustnessto desynchronization perturbation. It is the subject of Section V.Performance of the proposed data hiding system is evaluated inSection VI. Conclusions and future work end this paper in Sec-tion VII.

II. DESIGN OF A NONINFORMED ADDITIVE

DATA HIDING SYSTEM

This section recalls classical results of digital communicationtheory [19] and psychoacoustics. These results are applied todesign the noninformed additive data hiding system depicted inFig. 1, which embeds the hidden information in the time domain.

A. Modulation

The information to be embedded is assumed to be encodedinto a sequence of symbols chosen with equalprobability among the set (each symbol rep-resenting a vector of binary digits). The modulationprocess, first step of the embedder, aims at associating in a bijec-tive way each of the possible messages with a discrete-timesignal, which is referred to as the modulated signal . It re-quires an embedding codebook , containing waveformswith length

(1)

with , , for all , and. These waveforms are chosen biorthogonal [19], which

means the waveforms , are orthogonal forand for . Each

symbol is embedded during the time intervalusing the waveform , so that the modulated signal can

easily be expressed as

(2)

The resulting information rate, expressed in bits per second,is given by

(3)

where is the sample rate of the audio signal.

https://www.researchgate.net/publication/2523628_Resynchronization_Methods_for_Audio_Watermarking?el=1_x_8&enrichId=rgreq-275badcf-b0b5-4a01-a382-3e0ed436ee0c&enrichSource=Y292ZXJQYWdlOzM0NTc2Mzk7QVM6MTM5MDk3MDI4OTYwMjU3QDE0MTAxNzQ3MjQ3MDY=

https://www.researchgate.net/publication/3318578_Improved_spread_spectrum_A_new_modulation_technique_for_robust_watermarking?el=1_x_8&enrichId=rgreq-275badcf-b0b5-4a01-a382-3e0ed436ee0c&enrichSource=Y292ZXJQYWdlOzM0NTc2Mzk7QVM6MTM5MDk3MDI4OTYwMjU3QDE0MTAxNzQ3MjQ3MDY=

https://www.researchgate.net/publication/200705570_Digital_Communications?el=1_x_8&enrichId=rgreq-275badcf-b0b5-4a01-a382-3e0ed436ee0c&enrichSource=Y292ZXJQYWdlOzM0NTc2Mzk7QVM6MTM5MDk3MDI4OTYwMjU3QDE0MTAxNzQ3MjQ3MDY=


https://www.researchgate.net/publication/3318571_Spread-Spectrum_Watermarking_of_Audio?el=1_x_8&enrichId=rgreq-275badcf-b0b5-4a01-a382-3e0ed436ee0c&enrichSource=Y292ZXJQYWdlOzM0NTc2Mzk7QVM6MTM5MDk3MDI4OTYwMjU3QDE0MTAxNzQ3MjQ3MDY=

https://www.researchgate.net/publication/3318564_Scalar_Costa_scheme_for_information_embedding?el=1_x_8&enrichId=rgreq-275badcf-b0b5-4a01-a382-3e0ed436ee0c&enrichSource=Y292ZXJQYWdlOzM0NTc2Mzk7QVM6MTM5MDk3MDI4OTYwMjU3QDE0MTAxNzQ3MjQ3MDY=


B. Approaching the Inaudibility by Using a PsychoacousticalModel

The inaudibility conditions are given by classical results inpsychoacoustics [20], [21]. They show that a signal canbe added to another and be inaudible given underthe condition that its power spectral density (PSD) is lower forany frequency than a masking threshold , computedfrom the PSD of . Since is nonstationary, the maskingthreshold must be computed each time that the properties of

are changing. is usually considered as stationary on20-ms-length intervals [22], so the masking threshold could becomputed in samples-length analysis windows (with

).The choice of and, consequently, the choice of the

embedding codebook depends on this time-varying maskingthreshold. On the one hand, the digital communication theory[19] requires that the power of must be high enough toensure the correct detection of the transmitted information.On the other hand, this power must be limited by the maskingthreshold. The appropriate adjustment of consists inchoosing as a spread-spectrum signal (in order to usethe whole frequency range) with a fixed power and in filtering

with a shaping filter . This filter is designed so thatthe PSD of the filtered modulated signal, referred to as thewatermarking signal , matches the masking threshold.

In practice, the waveforms of the embedding codebookare white, biorthogonal, and with unit power. To design theshaping filter [23], the autocorrelation coefficients ofare computed from the inverse discrete Fourier transform ofthe masking threshold . Levinson’s algorithm is used,yielding a stable and causal filter. The coefficients of haveto be updated each time the masking threshold is computed, thatis for each length analysis windows.

Masking thresholds have already been designed for MPEGcompression, influencing the bit allocation procedure. Theymanage the distribution of the quantization error power over thefrequency range with respect to psychoacoustical ear character-istics. However, they are computed from the normalized PSDof the audio signal and they can be adapted to the watermarkcontext provided that a scaling factor is introduced to controlthe inaudibility of the watermarked signal . The maskingthreshold that we implement is derived from the classicalpsychoacoustical model of MPEG [24] and is adapted tothe particular context of watermarking.

Finally, the watermarked audio signal is

(4)

where is the impulse response of and denotes con-volution.

C. Suboptimum Receiver

The blind optimum receiver aims at making a decision onthe transmitted information based on the observation of the dis-torted watermarked signal without knowing the unwater-marked audio signal .

To design this receiver, the following hypotheses will beconsidered.

• The channel is free from perturbation, so that the unwa-termarked audio signal is the only disturbance

(5)

• is related to a discrete-time random process .is assumed to be stationary on 20-ms-length inter-

vals, ergodic, to have zero-mean Gaussian distribution,and to be described by an autoregressive model withhigh order ( ). This model is classically used todescribe the spectrum of the audio signal using a finitenumber of correlation coefficients that facilitates theanalytical description of the optimal receiver.

• The intersymbol interference (ISI) introduced by thefiltering operations is negligible compared with thesignal-to-noise ratio (SNR) that fixes the conditionsof the watermark transmission over the audio channel.Indeed, the ISI can be characterized as the power ratiobetween the signal used to transmitted the th symbol andthe one resulting from the transmission of the pre-vious symbols. Simulations show that this ratio is about15 dB smaller than SNR. Thus, symbols can be detectedseparately, by dividing into samples-lengthintervals denoted as the signal-vectors , yielding thedesign of a suboptimum receiver.

On each samples-length intervals, the optimum decisionrule finds the signal that maximizes the probability of a cor-rect decision given the observation , denoted by ,over the set of possible waveforms. Since is chosen in the em-bedding codebook , the received symbol is given by

(6)

This optimum decision rule can be simplified since the pos-sible symbols are equally probable. By defining the randomvector modeling the observation , this rule is equivalent to themaximum-likelihood (ML) criterion. It consists in finding thesignal that maximizes the probability density function (pdf)of given over the set of possible waveforms of . This pdfcan be expressed as follows:

(7)

where refers to the equality with a multiplicative constant,is the covariance matrix (with dimension ) of , and

is a matrix representation of the filtering operation ofby .

Now is related to an auto-regressive random process.can then be written as using Choleskys decomposition

[25], where is a matrix representation of the whitening trans-formation of . This is equivalent to applying the whiteningfilter at the input of the receiver stage. Thus, the ML cri-terion is reduced to maximizing the correlation metric over theset defined by

(8)

https://www.researchgate.net/publication/200806409_DAFX_--_Digital_Audio_Effects?el=1_x_8&enrichId=rgreq-275badcf-b0b5-4a01-a382-3e0ed436ee0c&enrichSource=Y292ZXJQYWdlOzM0NTc2Mzk7QVM6MTM5MDk3MDI4OTYwMjU3QDE0MTAxNzQ3MjQ3MDY=


https://www.researchgate.net/publication/232601210_An_Introduction_to_the_Psychology_of_Hearing?el=1_x_8&enrichId=rgreq-275badcf-b0b5-4a01-a382-3e0ed436ee0c&enrichSource=Y292ZXJQYWdlOzM0NTc2Mzk7QVM6MTM5MDk3MDI4OTYwMjU3QDE0MTAxNzQ3MjQ3MDY=

https://www.researchgate.net/publication/243768434_Zwicker_U_Audio_engineering_and_psychoacoustics_Matching_signals_to_the_final_receiver_the_human_auditory_system_J_Audio_Eng_Soc_393_115-126?el=1_x_8&enrichId=rgreq-275badcf-b0b5-4a01-a382-3e0ed436ee0c&enrichSource=Y292ZXJQYWdlOzM0NTc2Mzk7QVM6MTM5MDk3MDI4OTYwMjU3QDE0MTAxNzQ3MjQ3MDY=

https://www.researchgate.net/publication/2985673_Perceptual_coding_of_digital_audio_Proc_IEEE?el=1_x_8&enrichId=rgreq-275badcf-b0b5-4a01-a382-3e0ed436ee0c&enrichSource=Y292ZXJQYWdlOzM0NTc2Mzk7QVM6MTM5MDk3MDI4OTYwMjU3QDE0MTAxNzQ3MjQ3MDY=

https://www.researchgate.net/publication/265576092_Gaussian_Elimination_Numerical_Linear_Algebra_for_Applications_in_Statistics?el=1_x_8&enrichId=rgreq-275badcf-b0b5-4a01-a382-3e0ed436ee0c&enrichSource=Y292ZXJQYWdlOzM0NTc2Mzk7QVM6MTM5MDk3MDI4OTYwMjU3QDE0MTAxNzQ3MjQ3MDY=


where is the whitened received audio signal and isone of the filtered watermark signals. These filtered waveformsdefine a reception codebook, denoted by a matrix theo-retically equal to . In practice, since the un-watermarked audio signal is not available during the receptionprocess, is approximated by the whitening filter of

, computed using the linear prediction method [23]. isupdated each length duration analysis windows. Similarly,

is approximated using the shaping filter designedfrom the masking threshold of . The reception codebook isfinally

(9)

If the reception codebook vectors have the same norms, (is constant), then the second term of the (8) may be omitted.However, the norms may differ, due to filtering. Becausethe inaudibility control parameter may be unknown at the re-ceiver stage, we use the normalized correlation metrics denotedby the following vector:

(10)

so that yields the received symbol in

the time interval . The formula (10) de-scribes the suboptimal receiver, as compared to the optimal onedescribed by the formula (8).

III. CONTROLLING THE INAUDIBILITY OF THE WATERMARK

A. Limits of the Psychoacoustical Model

Even though the psychoacoustical model is designed usingwell-known psychoacoustical results [20] to characterize si-multaneous masking, its design has one degree of freedom:the scaling factor . In the audio compression field, this factoris chosen by fixing a signal-to-mask ratio (SMR) value. Nev-ertheless, it has already been shown [18] that the SMR isdeficient in measuring the compressed audio signal qualityand that the psychoacoustical models performance varies withthe audio content.

Similar variations may appear in a watermarking context.Tests should then be performed to evaluate the perceptual dif-ference between the unwatermarked signal and the watermarkedsignal (after its spectral shaping regarding the psychoacousticalmodel) with respect to . This difference is measured usingthe PEAQ algorithm, proposed by the ITU recommendation BS1387–1 [18]. This algorithm compares the excitation patternsalong the basilar membrane in response to the two audio signalsand integrates the comparison results over time (taking both si-multaneous and nonsimultaneous masking) into the objectivedifference grade (ODG). This ODG can be interpreted using aperceptual grade, that describes the perceptual difference, fromimperceptible (when the ODG is 0) to very annoying (when theODG is ).

Fig. 2 presents the obtained ODGs with respect to for a setof ten audio signals. It shows that, on average, the lower is, the

Fig. 2. ODG with respect to the scaling factor �: the solid line presents themean value of ODG over ten audio signals, and the ten dotted lines present theODG for each audio signal.

better the audible quality. However, it confirms that, for a fixedvalue of , the audible quality strongly depends on the audiosignal that has been watermarked. Therefore, choosing an av-erage value of does not prevent from local audible distortion.An adapted value of must be chosen regarding the processedaudio signal.

B. Adaptive Scaling Factor

To prevent the system from local audible distortion, we makethe choice of the scaling factor depending of the ODG value.The scaling factor is updated each samples-length interval.The ODG is computed from the original audio signal and thewatermarked audio signal , taking into account allthe samples of the audio signal that have been already processed.The scheme proposed for controlling audio quality of the water-marked signal is depicted in Fig. 3. For the moment, since thesystem is additive, the watermark can be computed in a firststep and in a second one.

Suppose that is computed. We expect to reach a certainaudio quality corresponding to a certain ODG, denoted ODG .The scaling factor is chosen as follows: for a given value , thecorresponding watermarked signal is computed and the PEAQalgorithm is used to compute the ODG. If ODG ODG , thewatermark is more audible than expected and is decreased. Onthe contrary, if ODG ODG , the watermark is less audiblethan expected so is increased. This step is reiterated until thecomputed ODG becomes equal to ODG with a certain error

ODG .

IV. ESTABLISHING SYSTEM ROBUSTNESS TO ADDITIVE

CHANNEL PERTURBATIONS

To take the a priori knowledge of the audio signal into ac-count, a local copy of the receiver scheme is introduced at theembedder. The modulation stage is modified as presented inFig. 4 (for simplicity, the watermarked signal is not shown).



Fig. 3. Adaptive control of the scaling factor to reach watermark inaudibility.

Fig. 4. Modulator scheme, using a local copy of the receiver.

A. Using the Local Copy of the Receiver

The local copy of the receiver allows us to estimate the sig-nals taking part in the reception process. They are the whitenedaudio signal , the filtered modulated signal , the estimatedreception codebook , and the estimatedcorrelation metrics . Since the channel noise and the distor-tions that it implies on the received audio signal are un-known during the embedding process, and whichare used to compute the effective reception codebook are ap-proximated by and , that is the whitening filter andthe shaping filter of . Thus, the estimated reception code-book is

(11)

The inaudibility constraint and conditions of a correct detec-tion can now be stated. Later, we consider the embedding of thesymbol during the th symbol interval.1

The inaudibility constraint is ensured by the perceptualshaping filter and the scaling factor . now depends onthe choice of (yielding after filtering). It will be shown laterthan can be computed previously to the appropriate choiceof so that, in this section, can be supposed fixed for theth symbol interval and does not depend on . The inaudibility

constraint is now only imposed by the design of , andby forcing the modulated signal to satisfy the followinginequality:

(12)

Moreover, given signals estimation at the local copy of thereceiver, is detected with no error if the estimated correlation

1The index l of k has been omitted to ease the reading of equations.

vector has its maximum on the th component. Thus, the fol-lowing inequalities have to be satisfied at the input of thecorrelator:

(13)

where is some unknown channel noise.Defining a robust embedding strategy consists of establishing

how to choose the adapted watermark signal that conciliates theinaudibility constraint (12) and correct detection conditions (13)for any channel noise.

B. Probability of Erroneous Decision

By defining the vectors(with ), the probability of erroneous detection accordingto (13) is

(14)

We suppose that is a set of uncorrelated Gaussianrandom variables with distribution . The set

could be viewed as random vari-ables with distribution and statistically dependent[19], since the set of vectors is not orthogonal.2

The error probability can be upper-bounded using a for-mula adequate for statistically independent variables, which isexpounded in the Appendix:

(15)

2The vectors f~s g are not orthogonal since 1) they are computed from thebiorthogonal waveforms fs g and 2) they result from a filteringoperation, which does not maintain the orthogonality properties.


with . It follows

(16)

where .Maximizing system robustness amounts to finding that

minimizes and maximizes conjointly. Unfortunately,this problem has one degree of freedom, since for any given ,

increases with . In most state-of-the-art strategies [17],[10], minimizing is prioritized so that parameterizes thesystem. In this paper, since the channel noise is unknown, wedecided to set to a fixed value denoted and to choosethat maximizes the noise variance .

C. Choice of the Appropriate Modulated Signal

Since the received signal is expanded over the reception code-book waveforms, can be chosen in the signal space definedby . Due to filtering linearity, belongs to the signal spacedefined by . Therefore, can be searched as a linear combi-nation of the embedding codebook waveforms

(17)

which results in . Consequently, the choice of theadapted watermark signal depends on the choice of the code-book and the evaluation of the coefficients .

1) Choice of the Codebook and Choice of a Waveform toBe Detected at the Receiver: As in Costa’s model [12], westructure the embedding codebook as a set of orthog-onal subcodebooks . Each subcodebook

contains biorthogonal waveforms, allable to transmit symbol . Estimated reception subcodebooks,denoted by for all , can then be defined asfiltered versions of the subcodebooks by and .

When the symbol is transmitted, only one waveform ofthe estimated reception subcodebook plays a leading rolein the detection of . This waveform, denoted by , is theonly waveform which maximizes the correlation with the re-ceived signal over the waveforms of . Now, (13) shows thatthe higher the correlation between and , the easier the de-tection. Thus, is chosen so that

(18)

The probability of erroneous detection can be rewritten usingthe vectors and the

correlations as follows:

(19)

2) Evaluation of the Coefficients : The evaluation of isrelated to an optimization problem under constraints: choosing

that satisfies the inaudibility constraint (12) and ensures a

fixed value of given by (19) with a maximum noise vari-ance . This problem can be solved with an iterative algorithm.It deals with increasing progressively until obtaining its max-imum value. It processes as follows:

1) Given a certain value of , we aim at finding thatensures . However, this problem can have zero,one or several solutions depending on the signals config-uration and the chosen value of . Therefore, we preferchoosing that minimize . It is related to the followingoptimization problem:

(20)This problem is solved using a sequential quadratic pro-gramming method [26]. It yields a unique solution andthe corresponding minimum value of .

2) Since increases with , is modified with regardto the previous value of . If , isdecreased, and , is increased.

3) Steps 1) and 2) are repeated until approximates theexpected value or becomes null.

At the end, in the case where , the coefficients thatensure a fixed error probability with a maximized noise varianceare found.

In the case where is null, no watermark signal permitsa robust transmission of symbol as defined by the proposedstrategy. This case happens in practice in almost 1% of the totalnumber of signals configurations. An alternative strategy de-scribed in [27] is used to choose the adapted watermark. Thisstrategy consists in finding the watermark that maximizes a ro-bustness parameter, which characterizes system robustness toadditive channel perturbations. More details can be found in[27].

D. Choice of the Inaudibility Parameter

Simulations show that 70% of the modulated signal poweris dedicated to the waveform . The modulated signalchosen to ensure system robustness to additive perturbations andthe modulated signal designed by embedding are quiteclose. Now, since (18) used to choose does not depend on ,

can be computed previously to the design of by applyingthe inaudibility control procedure of Section III to .

V. ESTABLISHING ROBUSTNESS TO DESYNCHRONIZATION

PERTURBATIONS

Desynchronizing perturbations are one of the most severechannel operations to which an audio data hiding system shouldbe robust. In the considered applications, they can be the re-sult of a simple delay between the embedder and the receiver,a resampling operation in the case of a D/A conversion of the



https://www.researchgate.net/publication/4087867_An_audio_watermarking_scheme_based_on_an_embedding_strategy_with_maximized_robustness_to_perturbations?el=1_x_8&enrichId=rgreq-275badcf-b0b5-4a01-a382-3e0ed436ee0c&enrichSource=Y292ZXJQYWdlOzM0NTc2Mzk7QVM6MTM5MDk3MDI4OTYwMjU3QDE0MTAxNzQ3MjQ3MDY=



https://www.researchgate.net/publication/221669733_Practical_Methods_of_Optimization_1?el=1_x_8&enrichId=rgreq-275badcf-b0b5-4a01-a382-3e0ed436ee0c&enrichSource=Y292ZXJQYWdlOzM0NTc2Mzk7QVM6MTM5MDk3MDI4OTYwMjU3QDE0MTAxNzQ3MjQ3MDY=


Fig. 5. Structure of the modulated signal with the synchronization mechanism.

watermarked signal, or a time-stretching operation deliberatelyapplied on the signal to satisfy time broadcast constraint.

In both last cases, the drift, that is the relative difference be-tween the embedder and the receiver sampling frequencies, ispiecewise constant supposing that abrupt variations may onlyappear between two music excerpts, each time-stretched witha different value of the drift. This drift can modify the signalduration up to of the original length without changingthe perceptual audio quality: in this paper, we limit drift valuesto . These values should be increased in further study, butthey already change substantially the signal waveforms sinceone sample is added or lost every 50 samples.

A synchronization mechanism has to be introduced as far asthe correlation demodulator used by our system to detect thespread transmitted waveform is very sensitive to the good loca-tion of symbol. The proposed mechanism is based on the useof synchronization patterns that estimate the delay and the driftintroduced by the desynchronizing operation.

In this section, we only address the synchronization problemfor constant drift perturbations. Abrupt variations of the drift,silence insertion/deletion into the watermarked signal, or detec-tion of nonwatermarked part of the audio signal are not consid-ered in this paper but could also be solved using the synchro-nization patterns (e.g., difficulties in finding patterns may be in-terpreted as lack of watermarking).

A. Model of a Desynchronizing Operation

Regarding the drift limitation to (yielding slight pitchmodifications), a desynchronizing operation can be modeled asthe resampling operation of the audio watermarked signal (ini-tially sampled at ) at the sampling frequency with an initialdelay . The received signal can then be expressed as

(21)

where (with meaning rounding tothe nearest integer) models the scaling of the time axis betweenthe embedder and the receiver, isthe rounding error and . Thus, timestretching distorts the magnitude of the audio signal, but chieflythe location of each embedded symbol and its duration. A syn-chronization mechanism should then be introduced to carry outtiming recovery, which modifies both the embedding and the re-ceiver processes.

B. Synchronization Mechanism

The proposed synchronization mechanism aims at 1) esti-mating supposing that is known from the receiver and

2) use this estimated value to ease the location of groups of sym-bols and their detection. It is based on the detection of regularsynchronization patterns, added to the audio signal during theembedding stage.

At the embedder, the modulated signal can then be split upinto a two-level structure, as depicted in Fig. 5, which containsthe following:

• The header: synchronization patterns, denoted ,with length separated with zero values are em-bedded previously to the transmission of the embeddedinformation to perform the first estimation of . Theheader length is then .

• The messages: the digital information is split into se-quences of symbols, denoted as messages, eachembedded into samples of the audio signal. Thesynchronization pattern is then added before eachmessage. This pattern enables to locate each group ofsymbols and to update the estimation of . Finally, eachmessage length is .

The receiver scheme is modified consequently, as presented inFig. 6. is first estimated using the initial synchronizationpatterns and a Wiener filtering-based receiver presented in[28] designed to estimate the synchronization patterns from thereceived signal.3 This estimation requires an equalization proce-dure made up with the zero-forcing filter , yielding thefiltered audio signal , and the Wiener filter which minimizesthe mean square error . Now, a time-stretching op-eration on the audio signal stretches the embedded synchroniza-tion pattern with a ratio , yielding the received synchro-nization pattern . Thus, a set of stretched versions ofwhose ratios cover the range of possible values for iscomputed and is used with a sliding-correlation technique to lo-cate the embedded patterns. The maximum values of theobtained correlations are retained. The number of samples be-tween each of them permits to estimate .

Then, each message is detected given the current estimatedvalue of , the corresponding stretched versions of , de-noted by , and of the stretched codebook . For each mes-sage, the Wiener filter is modified with aim at estimating rather

than , so that its coefficients are computed by minimizing. A sliding-correlation technique is used to locate

the position of the synchronization pattern on the beginning

3Due to its low computational cost, the Wiener filtering-based receiver is pre-ferred to the suboptimum receiver developed in Section II to process the syn-chronization step. Indeed, using the suboptimum receiver would require to com-pute the filtered version of the synchronization pattern at each location wherethe pattern is searched in the audio signal, which is very costly. With the Wienerfiltering-based receiver, the searched pattern is the original one.

https://www.researchgate.net/publication/224751091_A_new_Wiener_filtering_based_detection_scheme_for_time_domain_perceptual_audio_watermarking?el=1_x_8&enrichId=rgreq-275badcf-b0b5-4a01-a382-3e0ed436ee0c&enrichSource=Y292ZXJQYWdlOzM0NTc2Mzk7QVM6MTM5MDk3MDI4OTYwMjU3QDE0MTAxNzQ3MjQ3MDY=


Fig. 6. Receiver scheme with the synchronization mechanism.

of each message. Knowing and , each symbol of themessage begins around the sample index

(22)

where is a correction parameter. This equation shows the driftbetween the embedder time axis and the receiver time axis. Totake a small error of the estimation into account, a sliding-correlation technique around the theoretical position is per-formed jointly to the decision process: the maximum value ofthe correlation between the whitened audio signal and eachwaveform of the stretched reception codebook gives 1) thevalue of as the difference between the index of the maximumvalue and , 2) the exact position of the th symbol, and 3) itsvalue using the same decision process as described in Sec-tion III. Finally, is updated each time the synchronizationpattern at the beginning of each message is located by evalu-ating the relative distance between the detected patterns.

This synchronization mechanism could easily be modified tosolve other problems dealing with the desynchronizing pertur-bations: as an example, abrupt variations of the drift or silenceinsertion/deletion can be detected by regularly embedding intothe audio signal the sequence of the synchronization patterns

which are used to initialize the estimation. This solutioncould also be used to detect nonwatermarked parts of the audiosignal: indeed as long as no synchronization pattern is detectedinside the audio signal, the signal may be unwatermarked or somuch distorted by the channel perturbation that it is not worthcarrying out the information reception.

VI. PERFORMANCE OF THE PROPOSED DATA HIDING SYSTEM

A. Test Plan

System performance is evaluated by using the three followingcriteria: 1) the audio quality of the watermarked audio signal,2) the bit error rate (BER) with respect to the transmission rate

for selected perturbations, and 3) the computation cost.A test sample of 20 different style audio signals, sampled

at kHz, is used to evaluate this performance. Testsample set contains singing voices, solo and symphony excerpts,speakers with background noise, etc. binary digitswere embedded on seconds of each signal, which repre-sents a total of binary digits.

The audio quality is evaluated using the PEAQ algorithm pro-posed in [18].

The BER is computed from the mean value of the numberof transmission errors. It is an efficient estimator of thetransmission error probability of the watermarking channelwith an accuracy (i.e., a 70% confidence interval) equal to

BER BER . Thus, the BER values presented in thispaper estimate the error probability with an error lower than

.The computational cost is measured as a ratio between the

simulation time and the duration of the processed audio signal:as an example, a ratio of 1.5 means that 1.5 s are required toprocess a 1-s audio signal. This simulation time is related tothe watermarking programs and to the calculating machine. Pro-grams are written in C language and called functions in Matlabfor performing the sequential quadratic programming methodused for the informed embedding strategy. The computer pro-cessor is the Intel Pentium 4 with 1.80-GHz clock frequencyand 512-MB RAM. Since the programs have not yet been op-timized, the proposed measures aim only at giving an order ofmagnitude of the system efficiency.

System robustness to channel perturbation is evaluated forthe “classical” distortions listed in [4]. Most of the no-desyn-chronizing perturbations are generated by the evaluation toolcalled StirMark BenchMark for Audio, which is available on-line [29] (and its default parameters). We also consider MPEGcompression, performed by an MPEG 1 Layer 3 digital encoderfor mono signals, white additive noise with SNR dB andtime stretching, performed by CoolEdit.

The used codebook contains orthogonal subcode-books, each having biorthogonal waveforms. The ex-pected ODG is chosen equal to , and the expectedprobability of erroneous decision to . Windows lengths are

samples. The synchronization mechanism isparameterized as follows: , with ,

and samples. To achieve system ro-bustness to MPEG compression, the codebook waveforms arespread in the frequency range from 0 to 6 kHz.

B. Experimental Results

The obtained ODG characteristic values are presented inTable I for two systems: the first one presented in Section II is



TABLE ICHARACTERISTIC VALUES OF ODG OBTAINED FOR 20 AUDIO SIGNALS WITH

TWO SYSTEMS: THE FIRST ONE IS WITHOUT THE INAUDIBILITY CONTROL, THE

SECOND WITH THE INAUDIBILITY CONTROL

Fig. 7. BER with respect to transmission rate for two watermarking systemswith a channel free from perturbation. The first system is based on the localinaudibility procedure of section III and the noninformed embedding strategyof Section II. The second system is based on the local inaudibility procedure ofSection III and the informed embedding strategy of Section IV.

noninformed and does not use the local inaudibility control pro-cedure, whereas the second one which combines the processesdescribed in Sections III and IV is designed to ensure the localinaudibility of the watermark. These characteristic values arethe mean value over the set of audio signals, the minimum andthe maximum values, and the difference between the maximumand the minimum. The obtained mean values prove that theinaudibility of the watermarking signal is improved due to theproposed local inaudibility control procedure. Moreover, fluc-tuations of the ODG with respect to the processed audio signal,which are underlined by the difference values, are decreased.It proves that the auditive quality of the watermarked signal isensured for each audio signal.

The obtained BERs with respect to the transmission rate whenthe channel is free from perturbation are presented in Fig. 7for two systems both using the local inaudibility control pro-cedure described in Section III: the first one is based on theblind embedding strategy presented in Section II and the secondone on the informed embedding strategy detailed in Section IV.The efficiency of informed embedding strategies over nonin-formed strategies on transmission reliability is once again em-phasized since the BERs with the informed embedding strategyare lower than those with the noninformed strategy. Now, thisefficiency can especially be measured. It can be concluded thatthe informed strategy contributes to divide the BERs of a non-informed watermarking systems by almost three.

Informed system robustness to perturbations is then exhib-ited in Table II and in Fig. 8. Table II presents the obtained

TABLE IIROBUSTNESS OF THE INFORMED SYSTEM TO NONDESYNCHRONIZING

PERTURBATIONS FOR A TRANSMISSION RATE OF 83 b/s (N = 882)

Fig. 8. BERs obtained with the informed watermarking system with respect tothe wow ratio of the time-stretching operation for three different transmissionrates: 41 b/s (N = 1764), 83 b/s (N = 882), and 165 b/s (N = 441) whenM = 4.

BERs for various nondesynchronizing perturbations when thetransmission rate is fixed at 83 b/s. It shows that the proposedsystem is robust to MPEG compression for transmission ratesuperior to 96 kb/s, to filtering, format changing and loudnessmodification since the BERs obtained for those perturbationsare not much worse than BERs for channel free from pertur-bation. Most distorting perturbations are the MPEG compres-sion for transmission rate below 64 kb/s, echo adding, and whitenoise addition. For these perturbations, the transmission relia-bility of the proposed informed system approaches (but is stillhigher than) the reliability of the noninformed system when thechannel is free from perturbation. Fig. 8 presents the obtainedBERs for transmission rate up to 160 b/s when the channel per-forms a time-stretching operation with a wow ratio varying be-tween . The synchronization mechanism is efficient sincethe BERs exhibit low variations with the wow ratio. These vari-ations are the greatest when the transmission rate is low (41 b/swith ), which is due to the low values of the ob-tained BERs at this rate. A deeper analysis of the obtained re-sults shows that the relative error on the estimation of the sam-pling frequency is smaller than . Moreover, no syn-chronization failure on the location of the patterns appears sothat the obtained BERs are only due to the bad recognition ofthe reception codebook waveform.

Finally, the computational cost of the three proposed systemsis detailed in Table III. These systems are the reference system


TABLE IIICOMPUTATIONAL COST AT 83 b/s (N = 882)

presented in Section II, the noninformed system with the inaudi-bility control procedure of Section III, and the final informedsystem proposed in Section IV. The computational cost of theembedder depends on the embedding strategy. The introduc-tion of the inaudibility control procedure increases the com-putational cost of the embedder but still permits the real-timeembedding of the transmitted information (using the computerdescribed in the previous subsection). Nevertheless, the use ofthe informed embedding strategy is more costly and cannot beimplemented in real-time using the mentioned above computer(due to the use of Matlab functions). The computational costof the receiver, even when the synchronization mechanism isused, is close to 1, which proves that the real-time reception ofthe embedded information is feasible. This cost is not dependenton the time-stretching drift since the same number of processes(that is the sliding-correlation techniques for synchronizationpatterns location and symbols detection) is performed whateverthe desynchronizing operation is. Therefore, our informed datahiding system can be used for broadcast applications, as longas the embedding is processed offline. Real-time implementa-tion of the embedder could be possible with specialized digitalsignal processors and powerful calculators.

VII. CONCLUSION AND FUTURE WORK

In this paper, a new informed audio watermarking system hasbeen presented. It is designed with the purpose of embeddingan added value to the audio signal by fulfilling the data hidingrequirements, that is the inaudibility of the watermark and therobustness of the transmission to selected perturbations. In thisfield, state-of-the-art has already pointed out the efficiency of in-formed embedding strategy. Nevertheless, the proposed strate-gies are deficient in limiting the local perceptual distortion andit is quite difficult to evaluate their performance in a data trans-mission scenario: first, they are often designed for applicationsrelated with copyright protection (which requires a low trans-mission rate to achieve system robustness), and second, the ob-tained performance is dependent on the inaudibility and trans-mission conditions that strongly vary from a system to another.

In this paper, the inaudibility is achieved using an innova-tive control procedure which locally adjusts a psychoacousticalmodel with respect to an objective evaluation of the watermarktransparency. The robustness to nondesynchronizing perturba-tion is achieved by the use of an informed embedding strategythat satisfies the local inaudibility constraint. The informed em-bedding strategy uses a local copy of the receiver at the em-bedder to choose the adapted watermark that maintains the errorprobability at a fixed value for a maximized channel noise andthat is still strictly limited by the inaudibility constraint. Ro-bustness to desynchronizing perturbation is achieved due to a

real-time synchronization mechanism, adapted to broadcast ap-plication. It exploits synchronization patterns to estimate thedesynchronization parameters and to locate the information tobe detected.

System performance on real audio signals have been evalu-ated. The PEAQ algorithm has been used to evaluate the dis-tortion introduced by the watermark and to assure that the wa-termark is almost inaudible. The transmission reliability of theembedded information has been evaluated by the BER and pre-sented for various channel perturbations from the MPEG com-pression to time-stretching operations. It proves that a robusttransmission through an audio channel with a BER of canbe achieved at a bit rate of almost 80 b/s in presence of pertur-bations.

Further studies should be carried out aiming at decreasingthe BER. Specific modulations (such as trellis coded modula-tion) or error correction codes could be introduced. Moreover,the proposed embedding strategy could be applied to a recep-tion scheme [28] based on equalization techniques which areknown to be much efficient than the proposed reception scheme.Testing has already started and preliminary results are very en-couraging.

APPENDIX

PROBABILITY OF ERROR IN THE CLOSED-LOOP SCHEME

According to (14), adopting the notation of the subsectionSection IV-B, the probability of detection without error equals

(23)

Variables defined forhave a mean value , a variance

, and a correlation coefficient of any pair of these variablesequals

(24)

If the correlation coefficient is not negative, then

(25)

e.g., for two Gaussian variables with zero mean and unitvariance, ,where is the correlation coefficient. Therefore, for ,

.This is also valid for Gaussian variables, yielding the

inequality

(26)

The inequalities (26) and (15) are fulfilled if the variablesexhibit nonnegative correlations, i.e., if for anypair of vectors and

(with ). If we neglect



the filtering operation described with matrices , then for thebiorthogonal codebook (in fact,if is orthogonal to and if equals

) and the variables exhibit nonnegative correlationsindeed. However, the filtering operation may change the signof the scalar product of filtered vectors and it is not excludedthat some of the variables may be negatively correlated.Simulations show that, for the filters used in our watermarkingsystem, this happens very rarely.

REFERENCES

[1] S. Craver, M. Wu, and B. Liu, “What can we reasonably expect from wa-termarks?,” in Proc. IEEE Workshop Applications Signal Process. AudioAcoust., Mohonk, NY, Oct. 2001, pp. 223–226.

[2] I. J. Cox, M. Miller, and J. Bloom, “Watermarking applications and theirproperties,” in Proc. IEEE Int. Conf. Inf. Technol.: Coding Comput., Las-Vegas, NV, Mar. 2000, pp. 6–10.

[3] L. d. C. T. Gomes, P. Cano, E. Gomèz, M. Bonnet, and E. Battle,“Audio watermarking and fingerprinting: For which applications?,” J.New Music Res., vol. 32, no. 1, pp. 65–81, Mar. 2003.

[4] M. Steinebach, F. Petitcolas, F. Raynal, J. Dittmann, C. Fontaine,C. Seibel, N. Fatès, and L. C. Ferri, “Stirmark benchmark: Audiowatermarking attacks,” in Proc. IEEE Int. Conf. Inf. Technol.: CodingComput., Las Vegas, NV, Apr. 2001, pp. 49–54.

[5] M. Miller, I. J. Cox, and J. Bloom, “Informed embedding: Exploitingimage and detector information during watermark insertion,” in Proc.IEEE Int. Conf. Image Process. (ICIP), Vancouver, BC, Canada, Sep.2000, pp. 1–4.

[6] Q. Cheng and J. Sorensen, “Spread spectrum signaling for speech wa-termarking,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process.(ICASSP), Salt Lake City, UT, May 2001, pp. 1337–1340.

[7] L. Boney, A. H. Tewfik, and K. N. Hamdy, “Digital watermarks for audiosignal,” in Proc. IEEE Int. Conf. Multimedia Comput. Syst. (ICMCS),Hiroshima, Japan, Jun. 1996, pp. 473–490.

[8] M. Swanson, B. Zhu, A. Tewfik, and L. Boney, “Robust audio water-marking using perceptual masking,” Signal Process., vol. 66, no. 3, pp.337–355, Oct. 1998.

[9] I. J. Cox, M. L. Miller, and A. L. McKellips, “Watermarking as com-munications with side information,” Proc. IEEE, vol. 87, no. 7, pp.1127–1141, Jul. 1999.

[10] B. Chen and G. Wornell, “Quantization index modulation: A class ofprovably good methods for digital watermarking and information em-bedding,” IEEE Trans. Inf. Theory, vol. 47, no. 4, pp. 1423–1443, May2001.

[11] C. Shannon, “Channel with side information at the transmitter,” IBM J.Res. Develop., no. 2, pp. 222–293, Oct. 1958.

[12] M. Costa, “Writing on dirty paper,” IEEE Trans. Inf. Theory, vol. IT-29,no. 3, pp. 439–441, May 1983.

[13] H. Malvar and D. Florencio, “Improved spread spectrum: A new modu-lation technique for robust watermarking,” IEEE Trans. Signal Process.,vol. 51, no. 4, pp. 898–905, Apr. 2003.

[14] F. Siebenhaar, C. Neubauer, R. Baüml, and J. Herre, “New high datarate audio watermarking based on SCS (scalar Costa scheme),” in Proc.Audio Eng. Soc. Convention, LosAngeles, CA, Oct. 2002. Preprint 5645.

[15] J. Eggers, R. Baüml, R. Tzschoppe, and B. Girod, “Scalar Costa schemefor information embedding,” IEEE Trans. Signal Process., vol. 51, no.4, pp. 1003–1019, Apr. 2003.

[16] L. de C. T. Gomes, E. Gomèz, and N. Moreau, “Resynchronizationmethods for audio watermarking,” in Proc. 111th Convention AudioEng. Soc., New York, NY, Nov.-Dec. 2001. Preprint 5441.

[17] D. Kirovski and H. Malvar, “Spread-spectrum watermarking of audiosignals,” IEEE Trans. Signal Process., vol. 51, no. 4, pp. 1020–1033,Apr. 2003.

[18] Recommendation B.S. 1387: Method for Objective Measurementsof Perceived Audio Quality, Int. Telecommunication Union, Geneva,Switzerland, 2001.

[19] J. Proakis, Digital Communications, 4th ed. New York: McGraw-Hill,2001.

[20] E. Zwicker and U. T. Zwicker, “Audio engineering and psychoacoustics:Matching signals to the final receiver, the human auditory system,” J.Audio Eng. Soc., vol. 39, no. 3, pp. 115–126, Mar. 1991.

[21] B. Moore, An Introduction to the Psychology of Hearing, 2nd ed. Nor-well, MA: Academic, 1982.

[22] U. Zölzer, Ed., DAFX – Digital Audio Effects. New York: Wiley, 2003.[23] M. Hayes, Statistical Digital Signal Processing and Modeling. New

York: Wiley, 1996.[24] T. Painter and A. Spanias, “Perceptual coding of digital audio,” Proc.

IEEE, vol. 88, no. 4, pp. 451–515, Apr. 2000.[25] J. E. Gentle, “Cholesky factorization,” in Numerical Linear Algebra for

Applications in Statistics. Berlin, Germany: Springer-Verlag, 1998,pp. 93–95.

[26] R. Fletcher, Practical Methods of Optimization. New York: Wiley,1987.

[27] C. Baras, P. Dymarski, and N. Moreau, “An audio watermarking schemebased on an embedding strategy with maximized robustness to pertur-bations,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., vol.4, Montreal, QC, Canada, May 2003, pp. 357–360.

[28] S. Larbi, M. Jaïdane, and N. Moreau, “A new Wiener filtering baseddetection scheme for time domain perceptual audio watermarking,” inProc. IEEE Int. Conf. Acoust., Speech, Signal Process., vol. 5, Montreal,QC, Canada, May 2004, pp. 949–952.

[29] A. Lang. Stirmark Benchmark for Audio (SMBA): Evalua-tion of Watermarking Schemes for Audio. [Online]. Available:http://amsl-smb.cs.uni-magdeburg.de/smfa/allgemeines.php

Cléo Baras was born in Paris, France, on April 17,1979. She received the State Engineering and M.Sc.degrees in signal processing from the Institut Na-tional Polytechnique de Grenoble (INPG), Grenoble,France, in 2002 and the Ph.D. degree in signal pro-cessing of musical signals from the Ecole NationaleSupérieure des Télécommunications (ENST), Paris,France, in 2005.

She is currently a Lecturer in the graduate schoolin electrical engineering, computer science andcommunications in the graduate school ENST. She

is pursuing her research activities on audio signal processing (including audiocoding, watermarking, and audio transmission over heterogeneous networks) inthe Signal and Image Processing Department, ENST, and the Equipe Traitementdes Images et des Signaux (ETIS) Lab, Cergy-Pontoise, France.

Nicolas Moreau (A’80) received the State Engi-neering degree from the Institut National Polytech-nique de Grenoble (INPG), Grenoble, France, in1969, the M.Sc. degree in automatic control fromLaval University, Quebec, QC, Canada, in 1972, andthe Habilitation à Diriger des Recherches degreefrom the University of Paris V, Paris, France, in1997.

He has been with the Ecole Nationale Supérieuredes Télécommunications (ENST), Paris, since 1972as an Assistant Professor, and since 2002 as a Pro-

fessor. His research interests include audio and multimedia signal processing,with particular emphasis on audio coding and watermarking.

Przemyslaw Dymarski received the M.Sc. andPh.D. degrees from the Wroclaw University ofTechnology, Wroclaw, Poland, in 1974 and 1983,respectively, both in electrical engineering, andthe D.Sc. degree in telecommunications from theFaculty of Electronics and Information Technology,Warsaw University of Technology, Warsaw, Poland,in 2004.

Currently, he is with the Institute of Telecommu-nications, Warsaw University of Technology. Hisresearch includes various aspects of digital signal

processing, particularly speech and audio compression for telecommunicationsand multimedia, text-to-speech synthesis and audio watermarking. Since 1986,he has been cooperating with Prof. Nicolas Moreau of the Ecole NationaleSupérieure des Télécommunications (ENST), Paris, France, in the domain ofspeech and audio processing.























https://www.researchgate.net/publication/2637209_Watermarking_Applications_and_Their_Properties?el=1_x_8&enrichId=rgreq-275badcf-b0b5-4a01-a382-3e0ed436ee0c&enrichSource=Y292ZXJQYWdlOzM0NTc2Mzk7QVM6MTM5MDk3MDI4OTYwMjU3QDE0MTAxNzQ3MjQ3MDY=










































controlling the inaudibility and maximizing the robustness in an audio annotation watermarking...

Documents