chaos based audio watermarking with mpeg psycho acoustic model i

5
ICICS-PCM 2003 15-18Daanba2003 SingapDn 3B3.1 Chaos Based A udio Watermarking with M PEG Psychoacoustic Model I A . Giovanardi, G . Mazzini, M. omassetti University of Ferrara, Italy Abstract A method to implement digital audio watermarking in the fre- quency domain is presented. The embedding is performed by using chaos based sequences to increase the signature biding properties. The proposed scheme takes into account the MPEG psychoacoustic model I to improve the robusmess against the MP 3 compression. Som e tests have been performed10 verify the system robustness against the MP3 compression and also against the signal cropping, re-sampling, re-quantization and filtering. 1 Introduction Researches on digital aud io ahd video signals have permitted to design systems with low distortion and noise, by increasing the digital source diffusion on he worldwide market. The negative counterpart is the facility to copy audio and video materials and to tamper them with digital files. A s a result, performers, studios, distributors and retailers need a reliable, tamper-proof and permanent audio watermarking so- lution ([I] [2], [3], [4]) to embed inaudible (or not visible) and indelible information, to pr otect and track inclusive content. In particular, to meet today's copyright protection riquirements, electronic watermarking techniques, applied to audio signals, must satisfy four basic requirements [5]: the watermark should be inaudible, i.e., the sound quality must not be significantly cor- rupted, indelible, i.e. , the watermark should not be removed from the audio signal; robust, i.e ., the watermark should be resistant to the main digital manipulations; invisible, i.e., the watermark presence should not be easily verified in order to prevent its re- moval. Besides, multiple watermarks should be supported to track the ownership passing or the movem ent of the proprietary source. Finally, the watermark detection should not require the copy of the original audio track. This paper proposes a new scheme to protect audio signal copyright taking into consideration the above mentionedreq uire- ments. An often used watermarking algorithm is the Patchwork approach [I]. A similar approach which is working in the time domain has been discussed by (21. In [31 a frequency domain approach is shown, where the watermark detection requires the copy o f original audio tracks. Other methods act in frequency dom ain. e. g., [4] . Th e novelty of ou r work, with respect to [I], [2], [3], [4], is that it integrates different aspects: the water- mark detection does not require the copy of the original track, the watermark is applied in the frequency domain, it is based on chaotic sequences and the embedding algorithm is based on he MPEG psychoacoustic model I, tu improve robustness against MP 3 compression. 2 MPEG Psychoacoustic Model I The human auditory system [ 6 ] can detect sounds with frequen- cies between 20Hz and 20kHz and acts as a frequency analyzer. Thus, it can be modeled as a set of 3 2 bandpass filters with band- widths increasing with the frequency. This 32 bands are usually known as "critical bands": if a faint tone lies in the critical band of a louder tone, the faint one results tu be masked. However, also a temporal masking is present, i.e., to hear a faint tone fol- lowing a louder one, a give amount of time m ust be wait. The MF 'EG audio comp ression algo rithm s are based on these masking effects: perceptually irrelevant informations are re- moved to increase the data compression. L e t us briefly review the basic steps involved in the psychoacoustic model 1 algo- rithm. Audio data time alignment: a frame structure, with a given number of samples, depending on he MF 3 layers is cre- ated, frequency domain representation: th e audio samples are converted to their frequency domain representation, by using a Fast Fourier Transform (FIT); om1 and non-tonal components identification: th e spectral values are divided into tonal and nou- tonal components; the model identifies tonal components on he basis of the local peaks of the audio power spectrum, then it sum s all the remaining com ponents (nun-tonal) to obtain a single value for each critical band; by m eans of an empirical masking threshold, chosen close to the lower bound of the sound audibility, the masked components are decimated, glob al masking th res?shold choice: ing threshold is chosen, on the basis of the remaining tonal and non-tonal components; signal-to-mask ratio (SMR) valuation: the SMR, defined as the ratio between the signal energy within a given sub-band and the minimum masking threshold for that band, is computed. This value is used by the MPEG encoder to decide how many bits allocate in each band the greater is the SMR, the more bits are allocated. 3 Watermark Embedding The watermark embedding is performed by dividing the audio track into blocks, and by superimposing the same watermark- ing sequence on each block. So, the sequence casting is real- ized more times along the audio content. In particular, the main steps to make the embedding are: the audio file is decoded to ge t th e audio sample stream; the audio sample stream is di- vided into blocks and sub-block each block contains N sub- blocks, where the sub-block length is equal to the FIT block size, L, herefore, each block length is N L . th e FI T is applied to each sub-block with resulting vector of length L/2, due to the FI T symmetry; a chaotic sequence sized NL/2 is generated, the chaotic sequence is superimposed to each block, with size NL/2, containing the samples in the frequency domain. The signature sequence tu be hidden can be chosen in many ways, but watermarking schemes are often using pseudo-random (PN) number generators. In this work we ar e interested to use chaotic sequence generators, giving easy control on the chaotic trace statistical behavior in term of expected value, auto- and cross- correlation, due to that some studies in the Literature are showing better performance of chaotic watermarking with re- spect to the PN one. Th e chaotic watermarking signal for audio sources can be ob - tained by th e recursive application of a suitable one-dimensional discrete-time dynamical system, i.e., a chaotic map [71 [81. Le t 0-7803-8185-8/03/$17.00 2003 lEEE 1609

Upload: kvkumar128

Post on 10-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chaos Based Audio Watermarking With MPEG Psycho Acoustic Model I

8/8/2019 Chaos Based Audio Watermarking With MPEG Psycho Acoustic Model I

http://slidepdf.com/reader/full/chaos-based-audio-watermarking-with-mpeg-psycho-acoustic-model-i 1/5

ICICS-PCM 2003

15-18Daanba2003

SingapDn

3B3.1

Chaos Based Audio Watermarking with MPEG Psychoacoustic Model I

A. Giovanardi,G.Mazzini, M. omassettiUniversity of Ferrara, Italy

Abstract

A method to implement digital audio watermarking in the fre-quency domain is presented. The embedding is performed byusing chaos based sequences to increase the signature bidingproperties. The proposed scheme takes into account the MPEGpsychoacoustic model I to improve the robusmess against theMP 3 compression. Som e tests have been performed10 verify thesystem robustness against the MP3 compression and also againstthe signal cropping, re-sampling, re-quantization and filtering.

1 Introduction

Researches on digital aud io ahd video signals have permitted todesign systems with low distortion and noise, by increasing the

digital source diffusion on he worldwide market. The negativecounterpart is the facility to copy audio and video materials andto tamper them with digital files.

As a result, performers, studios, distributors and retailers needa reliable, tamper-proof and permanent audio watermarking so-lution ([I] [2], [3], [4]) to embed inaudible (or not visible) andindelible information, to pr otect and track inclusive content. Inparticular, to meet today's copyright protection riquirements,electronic watermarking techniques, applied to audio signals,must satisfy four basic requirements [5 ] : the watermark shouldbe inaudible, i.e., the sound quality must not be significantly cor-rupted, indelible, i.e., the watermark should not be removed fromthe audio signal; robust, i.e., the watermark should be resistantto the main digital m anipulations; invisible, i.e., the waterm arkpresence should not be easily verified in order to prevent its re-moval. Besides, multiple watermarks should be supported totrack the ownership passing or the movem ent of the proprietary

source. Finally, the watermark detection should not require thecopy of the original audio track.

This paper proposes a new scheme to protect audio signalcopyright taking into consideration the above mentionedreq uire-ments. An often used watermarking algorithm is the Patchworkapproach [I] . A similar approach which is working in the timedomain has been discussed by (21. In [31 a frequency domainapproach is shown, where the watermark detection requires thecopy of original audio tracks. Other methods act in frequencydom ain. e.g., [4]. Th e novelty of ou r work, with respect to [I],

[2], [3], [4], is that it integrates different aspects: the water-mark detection does not require the copy of the original track,the watermark is applied in the frequency domain, it is based onchaotic sequences and the embedding algorithm is based on heMPEG psychoacoustic model I, tu improve robustness againstMP 3 compression.

2 MPEG Psychoacoustic Model I

The human auditory system [6 ] can detect sounds with frequen-cies between 20Hz and 20kHz and acts as a frequency analyzer.Thus, it can be modeled as a set of 3 2 bandpass filters with band-widths increasing with th e frequency. Thi s 32 bands are usually

known as "critical bands": if a faint tone lies in the critical bandof a louder tone, the faint one results tu be masked. However,also a temporal masking is present, i.e., to hear a faint tone fol-lowing a louder one, a give amount of time m ust be wait.

The MF'EG au dio comp ression algo rithm s are based on thesemasking effects: perceptually irrelevant inform ations are re-moved to increase the data compression. L e t us briefly reviewthe basic steps involved in the psychoacoustic model 1 algo-rithm. Audio data time alignment: a frame structure, with agiven number of samples, depending on he MF3 layers is cre-ated, frequency domain representation: the audio samples are

converted to their frequency domain representation, by using aFast Fourier Transform (FIT); om1 and non-tonal componentsidentification: th e spectral values are divided into tonal and nou-tonal components; the model identifies tonal components on hebasis of the local peaks of the audio power spectrum, then it sum sall the remaining com ponents (nun-tonal) to obtain a single valuefor each critical band; maskedcomponents removal: by m eans ofan empirical masking threshold, chosen close to the lower boundof the sound audibility, the masked components are decimated,glob al masking thres?shold cho ice: for each band a global mask-ing threshold is chosen, on the basis of the remaining tonal andnon-tonal components; signal-to-mask ratio (SMR) valuation:the SMR, defined as the ratio between the signal energy withina given sub-band and the minimum masking threshold for thatband, is computed. This value is used by the MPEG encoder todecide how many bits allocate in each ban d the greater is theSMR, the more bits are allocated.

3 Watermark Embedding

The watermark embedding is performed by dividing the audio

track into blocks, and by superimposing the same watermark-ing sequence on each block. So, the sequence casting is real-ized more times along the audio content. In particular, the mainsteps to make the embedding are: the audio file is decoded toge t the audio sample stream; the audio sample stream is di-vided into blocks and sub-block each block contains N sub-blocks, where the sub-block length is equal to the FIT blocksize, L,herefore, each block length is N L . th e FIT is appliedto each sub-block with resulting vector of length L / 2 , due to theFIT symmetry; a chaotic sequence sized N L / 2 is generated,the chaotic sequence is superimposed to each block, with sizeN L / 2 , containing the samples in the frequency domain.

Th e signature sequence tu be hidden can be chosen in manyways, but watermarking schemes are often using pseudo-random(PN) number generators. In this work we are interested to usechaotic sequence generators, giving easy control on the chaotictrace statistical behavior in term of expected value, auto- and

cross- correlation, d ue to that some studies in the Literature areshowing better performance of chaotic watermarking with re-spect to the PN one.

Th e chaotic watermarking signal for audio sources can be ob -tained by the recursive application of a suitable one-dimensionaldiscrete-time dynam ical system, i.e., a chaotic map [71 [81. Le t

0-7803-8185-8/03/$17.002003 lEEE 1609

Page 2: Chaos Based Audio Watermarking With MPEG Psycho Acoustic Model I

8/8/2019 Chaos Based Audio Watermarking With MPEG Psycho Acoustic Model I

http://slidepdf.com/reader/full/chaos-based-audio-watermarking-with-mpeg-psycho-acoustic-model-i 2/5

..---.-e.

......................... ...- ...Figure 1: Absolute threshold curve, function of the frequency Figure 2: Example of watermarking sequence after shaping.

( k w .

us refer to a particular class of maps [9],called Piecewise AflineMarkov M aps (PWAM), characterized by a M :X -$ X , whereX = [0,1], with Zk+1 = M ( z k ) and assume that n + 1points0 = < a1 < ...< an-l < a , = 1 existdefiningtheintervalsX j = [a j -1 , a j ] , or j = 1,...,n, uch that, for any couples ofindices j nd k, either xk

A particular class of PWAM are he socalled (n,)-tailed shiftmaps [lo], where n, t are integer such that n is even and t <4 2 :

M ( X j ) or Xk nM ( X j ) = 0.

(n- ) z ( m o d e ) +i f 0 5 z <t (z- %)(modi) o therwise

( z )=

The sequences that we have superimposed to the audio contentin the frequency domain have been obtained by iterating the(n,)-tailed shift maps, where the lint iteration element 20 isthe seed of the sequence and it represents the Watermarking key.Before the casting, the generic chaotic sequence ZL has beenprocessed by means of the function Q M + [-1, +1], whereQ(x)= 2 Z k - 1,in order to obtain a sequence with null mean.With this, the final chaotic sequence is $ k = &(a&). Further-more, we have considered just N L / 4 different values of the fi -nal chaotic sequences, i k , with k = 0, . . ..N L / 4 - 1,and putthem into the even position of the audio frequency block withnatural sign, and into the odd one with the opposite sign, i.e.,

permits to have lower cross-correlations, as verified by meansof experimental tests. Let us underline that this mapping doesnot change the watermarking sequence auto- and cross- correla-tion statistics, considering that th e sam e law has been used in thedetection unit.

Inorder to minimize the sound

distonion,let us observe thataudio spectrum components which are less audible by human

hearing, can be more affected by the watermark cast with re-spect to components more audible. So, we have shaped thechaotic sequence in each sub-hlock to enhance terms assigned tothose spectrum components which are ess audible by the humanhearing. In particular, we have considered the ab solute thresh-old curve, T ( J ) ,.e., the minimum audibility threshold fo r thehuman auditory system, function of the frequency, studied in[ I l l and reponed in Figure 1. Then we have identified a sim-ple first degree polynomial function, based on T ( f ) , uch thatthe shaped quantized chaotic sequences in each block, i k , withk=O,... , N L / 2 - 1 , r e s ~ l t s :

yk = ( - l ) k i l k , z j , with k = 0,. . .N L / 2 - 1. This procedure

where S indicates the shaping strength, J M is the maximum fre-quency of the absolute threshold curve, TM and T,,, representthe maximum and minimum values of such curve, respectively.

After the shaping rule application, the i k equence energyE5 = C k i has been forced to a referring value, E,, in order

to enhancc the reliahility of the watennarkmg detection process,as discusscd in the following. The final embedding normalized

sequence resu~t s * = i k& .

The E, value is chosen such that z k << 1, conditiiin guruarm-t n in g a good delection. as rcponed in Section 4. An example Of

the normalixed shaped chaotic wquencs!, cvolution is reponedin Figure 2.

The ca..ting of [he watermark is made by using a cpread spec-trum technique. as i n 1121 and [13]. A possihk implemenwlionof this twhnique is ohtained h y slightly modifying the magni-tude of each frequency tone in each lrvquency sub-hlock (i n thissxw nons m odification is imposed to the tone ph w ).

Let us call Rc and U; the onginal FIT sample module andthc watermarked sample module, respectively. Then: U'k =

R * ( l T z*).

Note lhat the amplitudc of the cm tng IS mod ulated by mciin ofthe parameter E,. Another possible casling rule h;is been inves-tigated: Wk = Rk T Zk. which is characterized by a simple addi-tive proces. Unfonunaoly.tlus possible rule suifers panicula lyof the prohleni oi magnitude clipping. i.e., the crop of possihlenegative amplitude$. due to the ,upcrimpsition of negative wa-termark values. In particular the negative clipping desmys hoththe audibility and the w atermarking detection capability, so th ktechnique has not further investigated. Finally, let us uhwrvethat, bince R* and t k ;in. indepcndcnt and the sequence Zk h anull mean, i t follows that:

k k

4 Watermark Detection

The classical watermarking detection system , independently onthe domain in which the watermarking is applied, is based on acorrelation scheme. Thus, the uivial way to implement a detec-tion consists to perform a simple correlation between the water-marking sequence and the investigated audio stream. By imple-menting this strategy the correlation is:

k k k k

where the approximation follows the observation in (1) and

each block).

Observe that y k and the sample Rk are independent each otherand that this property holds also when the ZL sequence is consid-

ered, due to the independence of the absolute threshold curvefrom R k . With this, the previous correlation should he approxi-mated to C k ; E,RA. ven if the term C k : is well known,the term xiR, depends on the particular audio stream. So, itshould be measured or approximatively evaluated at the receiverfrom the watermarking stream, with some possible errors. In

where k = 0 , . . . .N L / 2 - 1 (i.e., correlation is computed on

1610

Page 3: Chaos Based Audio Watermarking With MPEG Psycho Acoustic Model I

8/8/2019 Chaos Based Audio Watermarking With MPEG Psycho Acoustic Model I

http://slidepdf.com/reader/full/chaos-based-audio-watermarking-with-mpeg-psycho-acoustic-model-i 3/5

practice, the optimal detection should be based on the knowl-edge of the origina l audio stream.

Another problem with this approach is related to the potentialhigh dynamic of R k : in the correlation computation the termswith higher energy are suongly considered with respect to theothers, so the watermarking sequence statistical properties arebroken by the particular & stream.

To reduce the two above cited problem s, we propose to use alogarithm ic correlation index, as follows:

By observing that we have chosen E, such that Zk < 1and re-calling (I), it is possible to consider the approxim ation In Wk=In[&(l + Z k ) ] = In& + l n ( l + zb) F;! h& + 21. that givesC sz 1 f the watermarking is present and it is vanishing other-wise. In such a way we have both neglected the problem of theknowledge of the original audio track and of the potential highdynamic of R k . Note that the variances of the auto- and cross-correlations decrease, independently on the audio sueam con-sidered, by increasing the watermarking seq uence length. So, toenhance this effect we have averaged th e correlation on a certainnumber of available blocks, N E , each of dimension N L / 2 , by

obtaininge.Regarding the watermarking robustness to the MPEG audiocompression, let us observe that tones having the greater SMRin a sub-block have allocated for their representation the largestnumber of bits. So these tones have the higher probability tohe not corrupted by the MP EG compression. Thus, in order toimprove the Watermarking robusmess, the detection may takeinto account only sets of tones, in each sub-block, having thegreater SMR. In particular, to identify the fraction of tones se-

lected for the detection with respect to the total, a parameter P,representing the ratio between the selected tones and the numberof available tones L / 2 , has been ntroduced. When P = 1 hepsychoacoustic model is not considered. By considering P < 1and by decreasing P the robustness against the M PEG compres-sion increases, i.e., the mean value of the correlations do notconsiderably change with respect the case without attack, evenif, since the number of watermarking sequence samples is lower,

the variance increase (note that this trend can be neglected byincrease the number of blocks NE) .

Furthermore, if we consider P < 1,an intrinsic robustness toa hand-pass filter manipulation is acquired, due to that the de-tection is performed only by considering components located atmedium frequencies, which are the more audible, following thepsychoacoustic model. Note that, in order to jointly threat thecases P = 1 (without tone selection) and P < 1 (with toneselection), the indexes of the sum s present in the various m athe-matical expressions are n ot explicitly specified.

To take a decision regarding the presence of the watermarking,the correlation cmust be com pared with a threshold, in order to

verify if the watermark is present (c ;! 1)or no t (c ;! 0). Let

us note that should be null both in the case of absence of thewatermark selected to prove the ownership and in the case inwhich the watermark selected is present but we prove to detect

a watermark not correct. The detection threshold should be inthe interval [0, ] , and its choice is depending on how the au toand cross- correlations are distributed around 1 and 0, respec-tively. In particular this choice should minim ize the probabilitiesto have false alarm (i.e., in the audio track a wrong watermarkis detected) and false rejection (i.e., in the a udi o track the se-

lected watermark is not detected). In general, the pdf of the auto-

and cross- correlations show a Gaussian shape with mean valuesaround 1and 0, respectively and variances depending on he wa-termarking sequence statistical properties and on the audio trackcharacteristics.

Note that many types of manipulations, such as compression,filtering, re-sampling o r re-quantization, act shifting versus lowvalues the auto-correlation pdf and its expected value, and by

increasing the relative variance. Also the cross-correlation pdfshows a slight shift versus left values and a variance increase. Inany way, the auto- and cross- correlation variances increase canbe rednced by increasing the watermarking sequence length (inour case N B ) .

By calling i k the watermarking sequence used for the detec-tion, it follows that:

(2)where C ; is the correlation on the block i. If i k = Z k we expect

The knowledge of the auto- and cross- correlation pdf Gaus-sian shape, with means and variances, permits to select the op-timal threshold value [3]. In particular, if we assume that theGaussian pdfs have means m, and m. and variances and w,,

for cross- and auto- correlations, respectively, the false alarm andfalse rejection probabilities, Pf. and Pf T , espectively, result:

-c F;! 1 , i f i k # Z k c F;! 0.

where th s the decision threshold. The optimal threshold can beobtained by considering Pf. = Pfv,aving specified means andvariances.

Unfortunately, means and variances of the auto- and cross-correlation pdfs are changing with the watermarking sequences,with the audio stream considered and with the particular digi-tal manipulation applied to the audio file, while the thresholdth must be fixed a priori. So, th is in general fixed by con-sidering only one watermarking sequence, a given audio streamand without consider any attack [3]. To avoid dramatic effectswhen attacks are present or when a detection with different se-

quences is tried, th should be selected by considering all possiblewatermarking sequences and all possible digital manipulations,independently from the audio sample. Note that experimentalvials have shown that the attacks act in a relevant way espe-cially on the auto-correlation. while the cross- correlation is mi-nus affected; its mean remains close to 0 and its variance doesnot vary if the watermark sequence is applied to a long audiostream. So, due to the low variability of the cross-correlationpdf, a possible solution to make the th selection is to fix it moreclose as possible to 0 ha n 1 (to counteract the shift of the auto-correlation), having evaluated the average mean and variance ofthe cross-correlation pdf in different cases, and having fixed agiven Pf.. In particular, we estimate the average mean an d vari-ance of the cross-correlation pdf using a given set of w atermark-

ing sequences (e.g., generated bya

selected chaotic map witha

given number of digits to specify the seed), considering an ex-tended hut limited set of audio samples, and by considering agiven set of attacks. On the basis of this average mean and vari-ance (depending on the watermark length), we set the Pf.. andobtain the relative th . Having fixed th ,we can proceed to verifya posteriori the actual values for Pf. an d Pfr.

1611

Page 4: Chaos Based Audio Watermarking With MPEG Psycho Acoustic Model I

8/8/2019 Chaos Based Audio Watermarking With MPEG Psycho Acoustic Model I

http://slidepdf.com/reader/full/chaos-based-audio-watermarking-with-mpeg-psycho-acoustic-model-i 4/5

1-

s-

o

a-

0 2 0 . 0 . 0 .

Figure 3: Auto- and cross- correlations of a watermarked audiosignal.

Let us note that some investigations on the selected class ofmaps give value for the term l/EoCk Zk, into the cross-correlation formula (2). ranging around 0.1. Furthermore, byconsidering an extended set of audio samples and by selectingdifferent kind of maps, we have found a bound value (in theworst case) for the l/EoCLt In& ranging around 0.25. Withthis, the minimum value of the threshold th for a good detectionis 0.35.

In order to make a good choice for the n and t parameterscharacterizing our chaotic maps, w e should calculate the corre-lation between the selected sequ ence (correct sequence) and anyother possible watermarking sequence (attack sequence). Thisprocedure should be iterated on all possible correct sequenceswe consider. Then, the optimal (n, ) ouple is that minimizingthe maximum correlation in the whole set calculated. Note that

if we consider n E { Z , . . oo}, t E 11,. n/2}, and loo0seeds uniformly distributed in [0 ,1], about 200days of process-ing should be necessary (with the hardware system available).Consequently, we have performed a limited set of measurementsand decided to use the couple n = 11,t = 1, that result to be agood choice compared with others investigated.

The proposed watermarking scheme support multiple water-

mark additions. L et us call 2 the i-th watermarking sequenceto he applied, then the watermarked audio stream after the ap-plication of different watermaking sequences results: Wk =

R r n j ( l + * p ) .

When the generic watermarking sequence j is used in the de-tection, the correspondent approximated correlation, before theaverage on N E blocks, is:

which is vanishing if the index i never assumes the j alue, dueto the observation in (1) and due to the low cross-correlationpropeny of chaos based sequences, when the rate of mixing [7]is large enough, as in the case of the selected maps. The sameconsiderations give that if i = j , C is close to 1.

5 Watermarking Robustness

To test the reliability and robustness of the presented watermark-ing algorithm we have randomly chosen various combinationsof sequence seeds and audio contents and considered differentkind of attacks, as reported in the following. All audio files are

60s long, sampled at 44.1 kHz CD quality), coded with 16bitsstereo and the shaping strength is fixed to S = 0.6, value guaran-teeing almost totally inaudible distortions for every type of audiocontents [41. We have considered N = 8 and L = 512, so thatin each block in the frequency domain we have NL/2 = 2048samples.

2 1 0 I 2 3

Figure 4 Experim ental auto-correlation pdf in the case of MP3

attack, having considered P = 1, and Gaussian distribution.

Figure 5: Experimsnral nuto-corn-lstion pdf in the cw of MP3attack. having considered P = 0.6. and G aussian distribution.

To verify the waterm arking reliability w e have selected a se tof IO00 seeds, by fixing between them the correct one; then, wehave embedde d the correct seed in the audio stream; and, finally,we have evaluated the auto- and cross- correlations.

The results are shown in Figure 3, where the correlation C,before the average procedure on N E blocks, is reported as afunction of the seed. Observe the correlation peak, close. to 1 ,for the correct seed (zo = 0.5) and the low correlation valuesfor all other possible choices. This result has been obtained fora particular audio stream but no difference has been observedby changing the audio content Furthermore, to show the per-formance improvement by averaging C on several blocks, wehave considered 3 different N B values: 32 (3 seconds of the au-dio stream), 54 (5 seconds), and 108 (10 seconds). For the 3cases we have considered the same threshold th = 0.36. T hecorresponding Pf., decreasing with the increase of N E , are:

0.77. 2.05 . an d 3.08. respectively. The Pf.are lesser than

The robustness against MPEG layer 3 (MF’3) compression hasbeen tested by using a compression degree generating tracksof 128kbps. In Figure 4 the auto-correlation pdf relative to awatermarked audio track compressed and decompressed is re -ported, having set P = 1, i.e., without consider the psychoa-coustic model. In this case we have considered only one blockN B = 1 by increasing N B the performance increase.). With theaim of comparison, the Gaussian distribution, characterized bythe same average and variance parameters, is plot. Let us un -de rl ie that the experimental pdf is very close to the Gaussianone, as previously discussed. Furtherm ore, the auto-correlationmean (0.45) is far from the expected value 1, implying a highprobability of false rejection; the variance is 0.023. In this case,having fixed th = 0.36, and with N E = 32,54,108 we havePf. = 0.70. 10W3,2.00. 10W5an d 3.00. lo-’, respectiv ely, and

PfT= 0.57. l o @ , Pfr= 1.25 . l0W5 and Pfr= 1.20. lo-’.In Figure 5 the same study of Figure 4 (with N E = 1) is per-formed, but considering the psychoacoustic model and fixingP = 0.6. The experimental pdf remains close to the Gaus-sian one (with same parameters), while the mean value is 1.1,

implying a low probability of false rejection, even if the vari-ance is higher (0.14). The variance decreases by increasingN E . In this case, having fixed th = 0.36 and by consideringN B = 32,54 ,108 the Pf. are of the same order of the case withP = 1but the Pfv re lesser than so a relevant perfor-mance improvement is present. All results show that, when the

1612

Page 5: Chaos Based Audio Watermarking With MPEG Psycho Acoustic Model I

8/8/2019 Chaos Based Audio Watermarking With MPEG Psycho Acoustic Model I

http://slidepdf.com/reader/full/chaos-based-audio-watermarking-with-mpeg-psycho-acoustic-model-i 5/5

Figure 6: Experimental auto-correlation pdf in the case of re-

sampling attack and Gau ssian distribution.

Figure 8: Experimental autworrelation pdf in the case of filter-

ing attack and Gaussian distribution.

Figure 7: Experimental auto-correlation pdf in the case of re-quantization attack and Gaussian distribution.

psychoacoustic modcl is used in the detection. the experimentalpdfs have mean c l o r to I . whilc with P = 1 they arc shifted to-

w d efL In both cases vm anccs can he maintained lo w hy con-sidering a number of hlocks NO sufficiently high. Consequenily

false rejection prohahilily results lower i f psychoacoustic modelIS used with respect to the case without it.

To prove the robustness against the rc-sampling of th e au-dio uack we have considered an original audio signal sampledwith rate W . kliz. Then. we have applied the watermarkingsequence. Furthermore, the sampling rate of the watermarkedaudio stream ha been reducud to 22 k"1 and. finally. the trackha5 b u n rc-sampled ~ t t hhe original rate of 44.1 kH1. AI-though the ahove pnreming CBUW noticeable distortions. thewatermark rem ainsdetectahlt., as reported in Figure 6, where theauto-sorrelation is shown. In this c w he averdge va lue, withNO = I is a b u t 0.76 and the variance 0.14. With t h = 0.36.block numbers Ne = 32,8 4.10 8, we have P J . = 0.GO. IO-',

1.00. and 4.00. I V 9 . mspat iw ly , and P J , = 0 . 7 7 . IO-"for N e = 32 and Pip esser than IO- '* for .VR = 64,108.

Regarding the rnhustnecs against the re-quantization. the wa-

termarked audio samples ha\e heen w-quantiml at %hi& andthen back at 16-bits. A l l h o u g h the ahove processing causc no-riceahle distortions. the watermarks remains detectable, as m-portcd in Figure 7, where the auto-correlation is shown forN E = I . In th is~nsetheaveragevalucisabut0.78andthe a r -

ance 0.064. With t h = 0.36. hlock numb en N B = 3 2 , 5 1 . 108,we have the same PJ. of the c a . . with re-,arnpling and PJ?l e s w thm

To est the rohustness aeainst filterine. the watermarked sienal

cedure guarantees the same reliability of the case without audiotrack cropping.

6 Conclusions

A new watermarking scheme applied to digital audio streams,with the scope of copyright protection, has been proposedan dtested. The embedding of the digital signature into the audiofile has heen perform ed in the frequency domain by integratinga shaping function derived from MPEG psychoacoustic modelI, whose effect is to enhanc e the reliability of the watermarkingdetection, without degradate the audio quality. ?he watermark-ing signals have been generated by considering a class of PWAMchaotic maps, guaranteeing a high number ofdistinguishable wa-

termarks.Th e proposed watermarking technique, thanks to the psychoa-

coustic model I, shows high rohusmess against MF'3 compres-sion. Furthermore, this scheme guarantees high reliability incase of digital manipulations like re-sampling, re-quantization,hand-pass filtering and cropping. Finally, this technique sup-ports multiple watermarking and does not require the originalsignal during detection.

References

[I1 B ~ W . . G r u h l D . . M l r n o m N . . L " . . ' T e e h d ~ u ~ f a d a r a h i d i n g . ' ' , I B M s y ~ a e wj o d s , Vol. 35. 19%.

[21 BmsiaP.. itan I., "Robnt a d o w 'n ink imedo-, in Roc. of €U-SIpcO'98. Rhodes.-% Sepr e 2 f Z 8 .

[3] Booey L., Tevh 1;A. H . , . H d y K.N.,"Dl 'tal w fa Audio Sigaplr.", inRoc. fEUSIPCO96, Tnesfe. l a y , SepL I&.

[7] A. LasotaM.C. Mac*, 'Th aoa , Fmtala, and NoiJe': Springer-Vcrlag. 2ndEdition1994

has been processed with a-bandpass 2 4 rder filter. The cuCoff 181 s. Tsekdou, . sotachidis, . NllrolaidLs. 1 Rfas, '-sfaLuticalAnalysiso fa

frequencies are 200Hz an d 10kHz. This filtering causes audibledistortions on the original signals. Also in this case the water-mark remained detectable as eported in Figure 8, for N E = 1.

In this case the average value is about 0.7 and the variance 0.15.With th = 0.36, block numbersN E = 32,54,108, we have the

Watwlpr*mgn Infmmatioo S y~t emBs wd ooBs mou l l ihemptlc

1293.Digitalhaotic~ " g ,cquenna':1ElawisSpc"d6) (zwi) pp, Issue273-

9''. to b;p-19 ~"~~, ,~~g, , , ,c~t ic watermar*m

[ l o ] M. P.Kmnedy.R.Rovam, G. ssoi. 'cbaotic Eleswonis~ n TclpomumSatiolu",

[ I l l AA.VY'Rycheacowtics.': MpJ/ww.c%s~.cdCours&er iM

same Pf. of the case with re-sampling and PfT 0.66. lo-', mcp s . zm.Pf , = 4.18. and Pf.< lo-", respectively.

3 6 5 n i l ~ ~ ~ C h ~ ~ ~ . ~ ~ ~ . 4 . h ~ .As described in Section 3 , the same watermarking sequenceis superimposed on blocks of the audio stream, Wehave selected blocks of length a b u t 0.1s. In case of cropping,

procedure that find the starting point of the watermarking se-

It21 COX I. , Kilian J .. highion T.. SbamoOO T., "A SeCUm. RObvJI Waramar* fOr MUl-dlntdia.", W & h Q On I n fOmdo n Hidng. Newton InsdWfe. U dv . Of Cambridge,May 1996.

forMullioEdia", IFJ@Trani. onlmage Recessing, 6. 1 2 . 1 6 ~ 1 6 8 7 . 997.

correct detection is possible by using a synchronism research [131 Cor ,., ~ T, Shamxn T,, ,,- spnead rmmWaarmar*inp

quence inside the audio track. This synchronism research pro-

1613