a relative phase based audio integrity protection method: model...

12
Research Article A Relative Phase Based Audio Integrity Protection Method: Model and Strategy Zhaozheng Li , Weimin Lei , Wei Zhang, and Kwanghyok Jo School of Computer Science and Engineering, Northeastern University (NEU), Shenyang 110169, China Correspondence should be addressed to Weimin Lei; [email protected] Received 24 May 2018; Accepted 4 October 2018; Published 21 October 2018 Academic Editor: Emanuele Maiorana Copyright © 2018 Zhaozheng Li et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Audio oriented integrity protection should consider the characteristics of audio signals based on the combination of audio application scenarios. However, in the current popular network interaction environment, traditional verification based solutions can no longer work. A kind of integrity protection scheme for audio business should be redesigned in these new scenarios. In this context, a method of audio integrity protection based on relative phase (RP-AIP) is proposed. rough the design of integrity object (I.O.), the integrity of audio can be abstracted as the completeness and accuracy of it. e I.O. is bound to the audio signal in a uniformly and randomly embedded manner, and the embedding rules are controlled by the relative phase characteristic of the audio itself. e model and strategy of RP-AIP are illustrated, and the corresponding process and algorithm are demonstrated as well. Simulation experiments illustrate the feasibility of the proposed solution and indicate the superiority of its performance. 1. Introduction e rapid development of digital multimedia and Internet technology has facilitated the interaction of various multi- media services based on images, videos, and also audio. In particular, the widespread adoption of new generation instant messaging applications, such as “WhatsApp”, “WeChat”, and “LINE”, has made voice based communication more and more popular. Since voice messaging is more convenient and easier to use than text or other mediums, it has quickly become a widespread way of instant messaging products. Voice-class audio business has now become the mainstream of network interaction. e voice messages can not only carry the speaker’s intention, but also identify the speaker. It is generally acknowledged that the voice-class interaction is more secure and reliable than the text-class. Indeed, it is even allowed to undertake high trust requirement functions. For example, voice authentication has been recommended in the process of transfer of accounts in “WeChat”. However, we should be aware that it is easy to change the original intention of the speaker by cutting or tampering operations on the voice messages. It must be recognized that malicious cutting or tampering attacks on voice-class audio business has become a new major issue troubling audio communication [1]. On this issue, audio integrity protection can effectively resist malicious tampering and cutting. However, currently available audio protection solutions are generally oriented towards copyright [2–4], and more details can be found in the survey done by R. D. Shelke et al. [5]. ese studies are not only in time domain [6], but also in various transform domains to discuss how to protect the integrity of audio signals [7–10]. e issue of audio integrity protection for network interaction scenarios is rarely studied. When it comes to this kind of audio class business, it has some special features, such as the following. (i) Streaming Characteristic is is the most important and essential characteristic that is different from other interactive mediums. Compared to static texts and images, the streaming media characteristic of audio determines that the media should always accompany the process of integrity protection. In other words, integrity measurement based protection solutions (SHA-1, MD5, etc.) can no longer work. is characteristic poses new require- ments and challenges for the integrity protection of audio. Hindawi Security and Communication Networks Volume 2018, Article ID 7139391, 11 pages https://doi.org/10.1155/2018/7139391

Upload: others

Post on 24-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Relative Phase Based Audio Integrity Protection Method: Model …downloads.hindawi.com/journals/scn/2018/7139391.pdf · 2019-07-30 · ResearchArticle A Relative Phase Based Audio

Research ArticleA Relative Phase Based Audio Integrity Protection MethodModel and Strategy

Zhaozheng Li Weimin Lei Wei Zhang and Kwanghyok Jo

School of Computer Science and Engineering Northeastern University (NEU) Shenyang 110169 China

Correspondence should be addressed to Weimin Lei leiweiminiseneueducn

Received 24 May 2018 Accepted 4 October 2018 Published 21 October 2018

Academic Editor Emanuele Maiorana

Copyright copy 2018 Zhaozheng Li et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Audio oriented integrity protection should consider the characteristics of audio signals based on the combination of audioapplication scenarios However in the current popular network interaction environment traditional verification based solutionscan no longer work A kind of integrity protection scheme for audio business should be redesigned in these new scenarios Inthis context a method of audio integrity protection based on relative phase (RP-AIP) is proposed Through the design of integrityobject (IO) the integrity of audio can be abstracted as the completeness and accuracy of it The IO is bound to the audio signalin a uniformly and randomly embedded manner and the embedding rules are controlled by the relative phase characteristic of theaudio itself The model and strategy of RP-AIP are illustrated and the corresponding process and algorithm are demonstrated aswell Simulation experiments illustrate the feasibility of the proposed solution and indicate the superiority of its performance

1 Introduction

The rapid development of digital multimedia and Internettechnology has facilitated the interaction of various multi-media services based on images videos and also audio Inparticular thewidespread adoption of new generation instantmessaging applications such as ldquoWhatsApprdquo ldquoWeChatrdquo andldquoLINErdquo has made voice based communication more andmore popular Since voice messaging is more convenient andeasier to use than text or other mediums it has quicklybecome a widespread way of instant messaging productsVoice-class audio business has now become the mainstreamof network interaction

The voice messages can not only carry the speakerrsquosintention but also identify the speaker It is generallyacknowledged that the voice-class interaction is more secureand reliable than the text-class Indeed it is even allowedto undertake high trust requirement functions For examplevoice authentication has been recommended in the processof transfer of accounts in ldquoWeChatrdquo However we shouldbe aware that it is easy to change the original intention ofthe speaker by cutting or tampering operations on the voicemessages It must be recognized that malicious cutting or

tampering attacks on voice-class audio business has becomea new major issue troubling audio communication [1]

On this issue audio integrity protection can effectivelyresist malicious tampering and cutting However currentlyavailable audio protection solutions are generally orientedtowards copyright [2ndash4] and more details can be found inthe survey done by R D Shelke et al [5] These studies arenot only in time domain [6] but also in various transformdomains to discuss how to protect the integrity of audiosignals [7ndash10] The issue of audio integrity protection fornetwork interaction scenarios is rarely studied When itcomes to this kind of audio class business it has some specialfeatures such as the following

(i) Streaming CharacteristicThis is the most important and essential characteristic

that is different from other interactive mediums Comparedto static texts and images the streaming media characteristicof audio determines that the media should always accompanythe process of integrity protection In other words integritymeasurement based protection solutions (SHA-1 MD5 etc)can no longer work This characteristic poses new require-ments and challenges for the integrity protection of audio

HindawiSecurity and Communication NetworksVolume 2018 Article ID 7139391 11 pageshttpsdoiorg10115520187139391

2 Security and Communication Networks

How to design such a scheme that can accompany the mediatransmission and unlink the global association is the firstessential issue to consider

(ii) ImperceptibilityThis characteristic is to ensure that the quality of expe-

rience (QoE) of users will not be degraded Comparing tothe human visual system (HVS) the human auditory system(HAS) is more sensitive and easy to perceive subtle changesin the audio signal This is also why audio signal processing ismore difficult than image and video signals Time-domain-oriented signal processing method tends to have bettersynchronization but at the same time it is rarely usedbecause of its serious signal distortionThis characteristic justcontradicts the requirements of the previous one which isanother new difficulty we need to consider

(iii) Self-DetectabilityAs streaming media audio signal cannot stay while lis-

tening while disappearing Therefore the integrity detectionshould be self-fulfilled without relying on the original signalThat is to say the integrity protection and detection processshould be synchronized aswell while playingwhile detecting

(iv) UnderdeterminationDue to the unpredictability of transmission environment

and noise effects we do not emphasize the complete certaintyof audio integrity protection More specifically we believethat the audio oriented integrity protection does not haveto be completely determined This is because our funda-mental goal of audio integrity protection is to ensure thecorrectness and credibility of the audio information (eg thespeakerrsquos intention) rather than the audio signal mediumitself

For these reasons a new method of audio integrityprotection based on relative phase (RP-AIP) is proposed inthis paper which is used for protecting audio from beingmaliciously cut or tampered and ensuring the audio informa-tion can be correctly conveyed Taking into account the abovefeatures we first divided the audio into uniform segmentsand sampled them to get discrete audio signals Then wetransformed them by DFT (Discrete Fourier Transform) andDCT (Discrete Cosine Transform) respectively In the DFTdomain we can get the relative phase relation of the hostsegment and use this as a rule In the DCT domain we willembed a series of constructed eigenvalues as the integrityobject (IO) into each segment furtherly By this method theintegrity of the audio can be abstracted as the completenessand accuracy of the IO in each segment This model hasbroken the direct relationship between integrity and audiomedia so as to achieve the purpose of synchronous detectionof streaming media

The remainder of this paper is organized as follows Weintroduced some related technologies about audio maskingeffect and the noise model in DCT Then we illustrated themodel and analysis of RP-AIP in Section 3 After that wepresented the process and strategy of RP-AIP and describedthe details of them in Section 4 In Section 5 we illustratedthe feasibility of our model with simulation experiments

Finally Section 6 concluded the paper and identified futuredirections

2 Related Technologies

21 Audio Masking Effect The HAS can be seen as a set offrequency analysis systems that contains about 26 band-passfilters which can distinguish about 20Hz to 20kHz HoweverHAS is difficult to distinguish between adjacent frequenciesthat is if a weak sound distributes in the adjacent frequencyof a strong sound the strong one will mask the weak one [11]This is the audio masking effect and the so-called adjacentfrequency is called the critical band with ldquoBarkrdquo unit Theaudio masking effect can be described by masking function(MF) which is related to the sound pressure (SP) of the audioand the distance (d) between the masked and the masker

119878119875 (119896) = 10 lg[[11198731003817100381710038171003817100381710038171003817100381710038171003817119873minus1sum119894=0

119904 (119894) ℎ (119894) exp(minus1198952120587 119894119896119873)10038171003817100381710038171003817100381710038171003817100381710038172]] (1)

where the SP(k) is the sound pressure s(i) is a frame of audiosignal N is the number of samples per frame and the h(i)is a weighted Hanning window as shown in the followingequation

ℎ (119894) = radic83 sdot 12 [1 minus cos (2120587119894119873 )] 119894 isin [0119873 minus 1] (2)

Obviously the closer to the masked and masker thegreater the masking effect Otherwise the weaker the mask-ing effect is until the masked is out of the critical band ofthemasker Furthermore themasking function of themaskercan be expressed as shown in (3) It can be seen that themasker has no effect on audio which is outside [-3 8] Bark

119872119865(119894) =

0 119889 isin (minusinfin minus3)17119889 minus 04119878119875 (119894) + 11 119889 isin [minus3 minus1)[04119878119875 (119894) + 6] 119889 119889 isin [minus1 0)minus17119889 119889 isin [0 1)minus (119889 minus 1) [17 minus 015119878119875 (119894)] 119889 isin [1 8]0 119889 isin (8 +infin)

(3)

Audio masking effect is the basis of audio embeddingThrough it the embedding data can be embeddedwithout thehuman ear being aware of it Therefore the embedding datashould depend on the original audio and the embedded dis-tribution must be determined by the masking characteristicof the original audio signal

22 NoiseModel inDCT TheDCT transform can convert thefrequency of the original audio signal to DCT coefficients asshown in (4) [12] The embedding of weak signals on strongsignals can be regarded as changes inDCT coefficients so the

Security and Communication Networks 3

essence of the embedding data is equivalent to an additivenoise

119878 (119896) = 120575 (119906)119873minus1sum119899=0

119904 (119899) cos((2119899 + 1) 1198961205872119873 )119904 (119899) = 119873minus1sum

119896=0

120575 (119906) 119878 (119896) cos((2119899 + 1) 1198961205872119873 )119899 119896 isin [0119873 minus 1]

(4)

where s(n) is the discrete sequence of audio signal in timedomain S(k) is the corresponding DCT coefficient sequenceN is the number of samples and 120575(119906) is a weight factor asshown in the following equation

120575 (119906) = radic 1119873 119906 = 0radic 2119873 119900119905ℎ119890119903119908119894119904119890 (5)

Assuming that the data embedded on the i-th coefficientof S(k) is E(i) then the inverse transform signal 1199041015840(119899) canbe calculated as shown in (6) In this way we can find thechanges between the original and transformed signals

1199041015840 (119899) = 119894minus1sum119896=0

120575 (119906) 119878 (119896) cos((2119899 + 1) 1198961205872119873 ) + 120575 (119894) [119878 (119894)+ 119864 (119894)] cos ((2119899 + 1) 1198941205872119873 ) + 119873minus1sum

119896=119894+1

120575 (119906) 119878 (119896)sdot cos((2119899 + 1) 1198961205872119873 ) (2119899+1)1205872119873≜Ψ997888997888997888997888997888997888997888997888997888997888997888rarr

= [ 119894minus1sum119896=0

120575 (119906) 119878 (119896) cos (119896Ψ) + 120575 (119894) 119878 (119894) cos (119894Ψ)+ 119873minus1sum119896=119894+1

120575 (119906) 119878 (119896) cos (119896Ψ)] + 120575 (119894) 119864 (119894) cos (119894Ψ)= 119873minus1sum119896=0

120575 (119906) 119878 (119896) cos (119896Ψ) + 120575 (119894) 119864 (119894) cos (119894Ψ)= 119904 (119899) + 120575 (119894) 119864 (119894) cos (119894Ψ)

(6)

Let 1199041015840(119899) minus 119904(119899) = 120575(119894)119864(119894) cos(119894Ψ) ≜ 120576(119894 119899) then the noiseeffect in the time domain can be expressed as

120576 (119894 119899) = radic 1119873119864 (119894) 119894 = 0radic 2119873119864 (119894) cos (119894Ψ) 119900119905ℎ119890119903119908119894119904119890 (7)

whereΨ = (2119899 + 1)1205872119873It can be seen from (6) and (7) that in the DCT domain

the noise effect is only related to the embedded coefficient

Thus we can evaluate the distortion of the original signalcaused by the embedding property

119873minus1sum119899=0

120576 (119894 119899)119904 (119899) =

119873minus1sum119899=0

radic1119873119864 (119894)119904 (119899) 119894 = 0119873minus1sum119899=0

radic2119873119864 (119894) cos (119894Ψ)119904 (119899) 119900119905ℎ119890119903119908119894119904119890= radic 2119873 sdot 119873minus1sum

119899=0

119864 (119894)119904 (119899) sdot 1radic2 119894 = 0

cos (119894Ψ) 119900119905ℎ119890119903119908119894119904119890

(8)

In (8) it can be seen that sum119873minus1119899=0 (120576(119894 119899)119904(119899)) prop 119864(119894)119904(119899)and the scaling factor can be furtherly defined as 120588(119894) in thefollowing equation

120588 (119894) = 1radic2 119894 = 0119873minus1sum119899=0

cos (119894Ψ) 119900119905ℎ119890119903119908119894119904119890 (9)

whereΨ = (2119899 + 1)1205872119873So far we learn that 120588(119894) is the final effect caused by

embedding on the audio signal and is also the sign of signaldistortion

3 Model and Analysis of RP-AIP

31 Mathematical Modeling According to the audio noisemodel we can establish the mathematical model as shown in(10) We expect to find such a binding relation that associatesthe IO with the audio signal Then the integrity of the hostaudio will be extracted and expressed accordingly

exist otimes119904119905 1198781015840 = 119878 otimes 119864

119864 larr997888 integrity of 119878(10)

where S refers to the original audio E refers to the IOand 1198781015840 refers to the transformed audio otimes here means asort of binding relation we expected which will be furtherintroduced later In addition E should be unperceivable butrobust to ensure that S will not be distorted

32 Model Structure The overall structure of the RP-AIPmodel is shown in Figure 1 and it consists of four parts Asignal processing part B DFT part C DCT part and D IOgeneration part Here we would like to apply the randomembedding approach as the binding relationThe audio signalis evenly divided into segments in Part A and the segment isthe basic unit for subsequent operations Then perform DFTand DCT transforms on one audio segment in Parts B and Crespectively The DFT transform is used to obtain the relativephase relation of the extreme points in its phase spectrumand the DCT coefficient matrix is the final embedded targetThe eigenvalues refer to the key encapsulated in the form offuzzy vault which are generated in Part D

4 Security and Communication Networks

Key Fuzzy vault

D IO generation

DFT transformation

Relative phase relation

Divide into segments

Coefficient matrix

DCT transformation

C DCTB DFT

A Signal processing

Eigenvalues

Integrity object (IO)

Sampling

Figure 1 The overall structure of the RP-AIP model The eigenvalues generated by the key and fuzzy vault are embedded into the DCTcoefficient matrix controlled by the relative phase relation indicated from DFT phase spectrum

Since the segments are evenly divided the IO can beequally distributed in the audio data and the size of the frag-ments can be adjusted by actual requirements In this way theintegrity of the audio can be abstracted as the completenessand accuracy of the IO by uniform embedding

In addition it is necessary to explain the function of PartD Considering the issues of network congestion accidentalpacket loss and so forth caused by network anomaliesthe received IO may not be exactly the same with theoriginal However the loss and damage of IO caused bythe transmission anomalies are generally accidental andrandom but when it comes to malicious attack it is generallydeterministic and directional Moreover as stated earlier thefundamental goal of audio integrity protection is to ensure thecorrectness and credibility of the audio information ratherthan the audio signal medium itself To this end the use offuzzy vault can solve this embarrassment because the fuzzyvault can guarantee certain similarity rather than completecertainty

33 Embedding Analysis Through the introduction and anal-ysis of audio integrity protection characteristics we know thataudio embedding should be transparent to the user On the

other hand the distortion tolerance characteristic requiresthat the embedding process should have some robustness[13] Interestingly these two requirements are contradictoryto ensure transparency it often means that the operation isusually based on the unimportant components of the audiosignal which are not sensitive to human ear such as thehigh frequency regions However robustness is generallydependent on the important components of the audio signalwhich are often sensitive to human ear such as the lowfrequency regions [14]

After the audio signal sampled it is transformed into anonperiodic discrete signal in the time domain and thena set of coefficient matrices composed of DCT coefficientscan be obtained by the DCT transformation Here the lowfrequency signal energy is gathered in the upper left cornerof the matrix and the high frequency energy distributed inthe lower right corner region is almost zero Therefore theeigenvalues can be embedded in these two regions the lowfrequency region is robust because it carries a large amountof the main signal energy while the high frequency regionhas good invisibility because of the absence of perceptionHere comes the question what kind of embedding strategyis the fairest To this end we propose to use the relative

Security and Communication Networks 5

phase relation characteristic of the audio signalrsquos frequencyspectrum to control the embedding position

4 Process and Strategy of RP-AIP

41 Audio Segmentation According to the scheme designconcepts we need to embed the eigenvalues evenly in theaudio signal so this requires that the audio signal data shouldbe divided evenly in advance

The number of segments can be determined based on theelements of IO and the size of the audio In general assumingthat there are n elements in the IO then we will divide theaudio signal data into n segments as well Thus it can beensured that the whole IO can be completely embedded andbound with the audio

42 Relative Phase Invariance Through the early part welearnt that the human ear is very sensitive to relative phasechanges of audio signal but lacks the ability to perceiveand distinguish the absolute phase Therefore for a certainaudio signal the phase component is more important thanthe amplitude component and any change or damage tothe phase component will cause unacceptable distortionof the audio quality or even completely destroy the audioinformation At the same time the communication theoryalso declares that the phasemodulation has strong robustnessto the noise signal

More specifically by DFT transform the phase distri-bution of a segment of the audio signal is easy to obtainThere is only one maximum point and one minimum pointin the phase spectrum as shown in Figure 2 and their relativepositions are fixed (if there are multiple extreme points thefirst one prevails) Asmentioned above the relative phase willmaintain consistency regardless of any transformation Thischaracteristic offers the possibility to random embedding ofthe IO

43 Random Embedding Strategy We will propose a strat-egy to determine the embedding position according to theposition of the extreme points in the phase spectrum Sincethe extreme points are random in the phase spectrumthe choice of the embedding position controlled by themis also random Therefore this strategy will guarantee themutual balance between the transparency and robustnessas previously described Furthermore the characteristic ofrelative phase invariance in frequency spectrum is uniquebecause only the audio segment is involved which guaranteesthe self- detectability as well

An 8times8 coefficient matrix D can be obtained from aDCT transformation of an audio signal s(n) as shown inFigure 3Themost robust embedding method recommendedis embedding at Position (00) while the most transparentone is at Position (77) Embedding at Position (00) canimprove robustness but adds more perceptual distortionwhile embedding at Position (77) can improve impercepti-bility but offers less robustness

To get a good compromise between these parametersthe mid-band coefficients such as (22) (L) and (44) (H) are

0 00003 00006 00009 00012 00015 00018 00021 00024 00027 00030time (s)

0 00003 00006 00009 00012 00015 00018 00021 00024 00027 00030time (s)

radi

an (r

ad)

maxima phase spectrum

minima

37500

30000

22500

15000

7500

0

minus7500

minus15000

minus22500

minus30000

minus37500

ampl

itude

(kH

z)

4

3

2

1

0

minus1

minus2

minus3

minus4

Figure 2The phase spectrum of a segment of the audio signal Theaudio sampling frequency is 44100kHz and the length is 05 secondsThe length of the intercepted fragment is 0003 seconds containing120 sampling points

L

H

0 1 2 3 4 5 6 7

0

1

2

3

4

5

6

7

Figure 3 An 8times8 coefficient matrix of DCT Position L representsthe low frequency region and position H represents the highfrequency region which cannot be perceived by human ear

6 Security and Communication Networks

(1) receiving the key vaults as kv (i)(2) set the number of kv (i) as n(3) divide the audio signal equally into n segments as s (j)(4) sampling the s (j) to c (j)(5) forallc(j) initialize j=1(6) while end do(7) if jlen then(8) calculate the phase spectrum 120593(120596) of s(j)(9) determine the position of the maxima and minima in 120593(120596)(10) calculate the coefficient matrix C (j) by 8 times 8 DCT of c(j)(11) if the maxima is in front of the minima then(12) embedding the kv (j) at C (22)(13) else(14) embedding the kv (j) at C (44)(15) end if(16) end if(17) end while

Algorithm 1 The algorithm of random embedding strategy based on relative phase

START

Have all the elements ofIO been embedded

Calculate the phase spectrum ofthe audio segment by DFT

Determine the relative positionof the maxima and minima

Is the maxima in frontof the minimum

Embed the elementat L

Embed the elementat H

Y N

N

Y

END

Figure 4 The flow chart of the random embedding strategy Thechoice of the embedding position between L and H is controlled bythe relative phase relation of the host audio segment

selected as the embedding position instead Position L or His the Boolean choice for random embedding The so-calledBoolean choice means that a choice must be made from thesetwo positions and the selection process is random Basedon the relative phase relation discussed above a randomembedding strategy is proposed as follows The flow chartof this strategy is shown in Figure 4 and its algorithm isillustrated in Algorithm 1

(i) If the maximum point of the audio segment appearsbefore the minimum point the corresponding eigen-value will be embedded into the low frequency region(Position L)

(ii) Otherwise the corresponding eigenvalue will beembedded into the high frequency region (PositionH)

After determining the embedding position the next stepis to construct a fuzzy vault containing the secret sequenceOn the finite field F the secret sequence key isin F By anencrypting set P (P isin F) the key can be packaged into thevault of P to generate the IO of the sender

After embedding the IO into the audio a new DCTcoefficient matrix 1198631015840 is generated By making a DCT inversetransformation on1198631015840 we can obtain a new audio signal 1199041015840(119899)containing IO as well Finally 1199041015840(119899) is the transformed audiosignal that we exactly expect

44 Detection and Extraction The detection and extractionof IO can be achieved by the difference between D and 1198631015840When the receiver receives IO it is essentially a copy Qof the fuzzy vault P containing the secret sequence On thefinite field F by matching the Q (QisinF) and P the key can beparsed from the vault to obtain the IO of the receiver Afterthe IO are recovered the integrity of the audio signal canbe determined according to the completeness and accuracyof them furtherly Figure 5 shows the reconstruction anddetection process

In Figure 5 (a) shows the reconstruction process of thehost audio and 1199041015840(119899) is the reconstructed one which containsthe IO (b) is the detection and extraction process of IO119890V is the eigenvalues and 119864V is its DCT transformation 1198901015840V isthe extracted eigenvalues and 1198641015840V is its DCT transformationBy matching the similarity between 1198901015840V and 119890V the similaritybetween the received and original audio can be evaluatedAccording to the features of the extracted IO it can be found

Security and Communication Networks 7

Sampling DCT

DCT

Inverse

Reconstruction

Detection

DCT

DCT DCT

Inverse

s(i) s(n)

s(n)

D

D

D

D

s(n)

(a)

(b)

s(n)

E

E e

e

Figure 5 The process of the reconstruction and detection of theeigenvalues (a) is the reconstruction process and (b) is the detectionprocess

whether the audio is cut or tampered More details can befound in the experimental section

45 Integrity Evaluation After the IO is extracted thesimilarity of 119890V and 1198901015840V can be calculated by their cosinedistance d and the integrity of the host audio 119868au can beevaluated based on this as shown in (11) It can be seen thatwhen 120575 takes 1 119868au is a decimal within (0 1) The greater thevalue of 119868au the higher the similarity also the better the audiointegrity

119868119886119906 = 120575 sdot 119889 = 120575 sdot sum119873minus1119894=0 sum119873minus1119894=0 119890 (119894) 1198901015840 (119894)radicsum119873minus1119894=0 1198902 (119894) sdot radicsum119873minus1119894=0 11989010158402 (119894) (11)

where d is the cosine distance of 119890V and 1198901015840V 120575 is a weightcoefficient and the default value is 1 119890(119894) is the i-th elementof 119890V and 1198901015840(119894) is the i-th element of 1198901015840V5 Experiments and Evaluations

51 Audio Quality Criterion The signal to noise ratio (SNR)is the widely approved and used audio quality criterion forthe evaluation of audio signal transformation as shown in thefollowing equation

SNR = 20 logradicsum119873minus1119899=0 1199042 (119899)radicsum119873minus1119899=0 [119904 (119899) minus 1199041015840 (119899)]2

(12)

Figure 6 The black-and-white image of 100times100 pixels as IO inthese experiments

where 1199041015840(119899) is the transformation of 119904(119899)SNR = 20 log

radicsum119873minus1119899=0 1199042 (119899)radicsum119873minus1119899=0 [119904 (119899) minus 1199041015840 (119899)]2

Eq(6)Eq(7)997888997888997888997888997888997888997888997888997888rarr

= 20 log[[[radicsum119873minus1119899=0 1199042 (119899)radicsum119873minus1119899=0 1205762 (119894 119899)

]]]= 20 logradic sum119873minus1119899=0 1199042 (119899)sum119873minus1119899=0 1205762 (119894 119899)= 20 logradic119873minus1sum

119899=0

119873minus1sum119899=0

1199042 (119899)1205762 (119894 119899) Eq(9)997888997888997888997888rarr

= 20 logradic119873minus1sum119899=0

11205882 (119894)

(13)

Therefore after embedding the IO on the i-th coefficientof s(n) the SNR can be calculated according to (6) (7) and(9) as shown in (13)

From (11) we can see that the SNR is only related to 120588(119894)that is to say the audio quality after embedded eigenvalues isonly related to the embedding position of its DCT coefficienti This proves that the embedding position is crucial to theimperceptibility and robustness of the host audio in ourRP-AIP model Therefore the embedding rule based on therelative phase of the audio signal itself is of extraordinarysignificance

52 Embedding and Extraction of IO In this experiment itis assumed that the IO is a black-and-white image as shownin Figure 6 and the host audio is a set of audio clips ofthe left channel as shown in Figure 7 and Table 1 Firstlywe uniformly embedded the IO in the host audio and thenrecovered it in the opposite way

According to (12) the SNR of different audio types iscalculated as shown in Table 1 which is in line with therequirements of robustness and imperceptibility

53 Integrity Protection Performance In the transmissionprocess the audio may suffer from network congestionaccidental packet loss and other accidents caused by network

8 Security and Communication Networks

Table 1 The SNR of different audio types after embedding IO

Audio clipsA1 A2 A3

Audio types A segment of voice message A prompt tone of Windows 10 A piece of musicSNR 3096 4233 4640

norm

aliz

ed am

plitu

de

0 02 04 06 08 10

0 02 04 06 08 10

time (s)

0 02 04 06 08 10

1

1

1

05

05

05

minus05

minus05

minus05

minus1

minus1

minus1

0

0

0

A3)(

A2)(

A1)(

Figure 7 The host audio clips in the experiment The audiosampling frequency is 22050 kHz and the length is 10 seconds (A1)a segment of voice message (A2) a prompt tone of Windows 10(A3) a piece of music

anomalies even filtering compression and AD conversionTherefore the first thing to consider is whether the model caneffectively distinguish between such accidents and maliciousattacks such as cutting and tampering As is stated above theloss and damage of IO caused by the transmission anomaliesare generally accidental and random but deterministic anddirectional caused by malicious attacks Thus this issue canbe judged from the recovered IO

531 Audio Processing In this experiment we simulated fivecommon audio processing methods smoothinglow-pass fil-tering (with 4kHz cutoff frequency) band-pass filtering (with200-2kHz cutoff frequency) MPEG-1 (with compressionratio of 1051 and 121) compression and AD conversionThe evaluation of audio signal distortion is indicated by thesimilarity between the original and recovered IO as well

Figure 8 shows the simulation results Among them (a)is the recovered IO after a smoothinglow-pass filtering (b)is the one after a band-pass filtering (c) is the one after aMPEG-1 (with compression ratio of 1051) compression (d)

(a) (b) (c)

(d) (e)

Figure 8 The simulation results of the recovered IO of differentaudio processing methods

is the one after a MPEG-1 (with compression ratio of 121)compression and (e) is the one after an AD conversion Thesimilarity between the original and recovered IO and thecomparison with the other two common audio embeddingmethods (embedding the IO on 0times0 or 119899 times 119899 coefficientwhich is called 0-coefficient or n-coefficient embedding forshort) can be found in Table 2 and presented by Figure 9

It can be seen from Figure 8 and Table 2 that the RP-AIPmodel can well adapt to the common audio processing thatis to say the general audio processing method will not lead tothe destruction and loss of IO It can be seen from Figure 9that the RP-AIP model has similar robustness against thecommon audio processing compared with the 0-coefficientembedding model In addition as far as the smoothinglow-pass and band-pass filtering methods are concerned theperformance of band-pass filtering of RP-AIP is superior toother models significantly thanks to the method of randomembedding in the mid-band coefficient In general RP-AIPexhibits sufficient robustness for general audio processingmethods

532 Malicious Attack In this experiment we simulatedthree kinds of malicious attacks as cutting tampering andresampling Moreover in the type of tampering it canbe divided into self-media tampering and non-self-mediatampering self-media tampering refers to using the mediaitself for shifting swapping and so forth while non-self-media tampering refers to using other media for replacementcoverage and so forth Resampling can also be dividedinto upsampling and downsampling upsampling refers to

Security and Communication Networks 9

Table 2 The similarity 120588 between the original and recovered IO in audio processing experiments

No Processing methodsSimilarity120588

Random embedding (RP-AIP) 0-coefficient embedding n-coefficient embedding(a) Original 100 100 100(b) Without processing 097 099 095(c) Smoothinglow-pass filtering 088 098 018(d) Band-pass filtering 090 085 084(e) MPEG-1 (with 1051 ratio) 079 088 068(f) MPEG-1 (with 121 ratio) 081 085 064(g) AD conversion 075 074 076

Processing methods

0

05

1

15

2

25

3

RP-AIP0-coefficientn-coefficient

095

099

097

018

098

088 090

085

084068

088

079 081

085

064 076

074

075

W LP BP MP(i) MP(ii) AD

Figure 9 The comparison of the simulation results of differentaudio processing and embedding methods W without processingLP smoothinglow-pass filtering BP band-pass filtering MP (i)MPEG-1 compression (with compression ratio of 1051) MP (ii)MPEG-1 compression (with compression ratio of 121) AD ADconversion

interpolation expansion of the original audio while down-sampling refers to the extraction of the original audio Cuttingand tampering are undoubtedly malicious attacks on audioand attempt to change the speakerrsquos intention Howeverresampling attacks often only affect the length or quality ofthe audio especially the upsampling but cannot change thespeakerrsquos intention

Figure 10 shows the simulation results Among them (f)is the recovered IO after a cutting attack (g) is the one aftera self-media tampering attack (h) is the one after a non-self-media tampering attack (i) is the one after an upsamplingattack and (j) is the one after a downsampling attack Thesimilarity between the original and recovered IO and thecomparison with the other two common audio embeddingmethods as mentioned above can be found in Table 3 andpresented by Figure 11

As can be seen from Figure 10 and Table 3 in additionto the upsampling attack the RP-AIP model can effectively

(f)(g)(f) (h)

(i) (j)

Figure 10 The simulation results of the recovered IO of differentmalicious attacks

Attack types

0

05

1

15

2

25

3

097

099

095

050

044

031

062

046

024

057

033019

092

098

079022

042

020

RP-AIP0-coefficientn-coefficient

W CT TP(i) TP(ii) RS(i) RS(ii)

Figure 11 The comparison of the simulation results of differentmalicious attacks and embedding methods W without processingCT cutting attack TP (i) self-media tampering attack TP (ii)non-self-media tampering attack RS (i) upsampling attack RS (ii)downsampling attack

10 Security and Communication Networks

Table 3 The similarity 120588 between the original and recovered IO in malicious attack experiments

No Attack typesSimilarity120588

Random embedding (RP-AIP) 0-coefficient embedding n-coefficient embedding(a) Original 100 100 100(b) Without attacks 097 099 095(h) Cutting 031 044 050(i) Tampering (self-media) 024 046 062(j) Tampering (non-self-media) 019 033 057(k) Resampling (upsampling) 097 098 092(l) Resampling (downsampling) 020 042 022

detect and reflect malicious attacks As stated earlier the IOis the tangible representation form of the audio integritythat is changes in IO directly reflect changes in audiomedia Therefore we can evaluate the type and degree ofthe attack on the audio by observing IO In the case of (f)since cutting will directly lead to the loss of IO when theIO is reconstructed by 100times100 pixels it will become anincomplete image The missing part of IO indicates that thecorresponding audio part has been cut off In the same waythe tampered part appears as random noise or disorderedas shown in (g) and (h) because the audio has suffered thecorresponding tampering attack However for resamplingattacks the RP-AIP shows different results as shown in (i)and (j) This is because the upsampling tends to interpolatethe original audio signal without causing loss of IO whiledownsampling is the extraction of the original audio signalwhich directly leads to the loss and destruction of IO Asstated in the Introduction we are more concerned aboutwhether the attack has tampered with the speakerrsquos intentionbut resampling is generally not

It can be seen from Figure 11 that the RP-AIP modelhas definite better performance than the other two modelsThis is because attacks such as cutting and tampering aregenerally oriented to the audio signalrsquos time domain ratherthan the frequency domain so the damage to the low andhigh frequency parts is random The embedding methodwith only consideration of low or high frequency can onlyguarantee one aspect while the random embedding methodin RP-AIP model can synthesize these two frequency partsThis is also the innovation and highlight of our work

6 Conclusion

Audio integrity protection for network interaction envi-ronment is not only an effective way to protect audioinformation but also an important link in building trustedcommunication Due to real-time constraints of interactionscenarios this kind of audio protection needs to reconsidersome new characteristics such as streaming characteristicimperceptibility self-detectability and distortion tolerancewhich are not covered by the traditional programsThis posesnew challenges to the existing audio protection solutions

To this end a method of audio integrity protection basedon relative phase (RP-AIP) is proposed in this work In

the RP-AIP model we established the concept of integrityobject (IO) in order to propose and abstract the integrityof the audio signal and then transform it into a tangiblerepresentation form In addition we also fully consideredthe characteristics of the audio signal in the DFT and DCTtransform domain and used the frequency characteristicsof the audio itself to guide the random embedding of IOthereby achieving blind detection and extraction of it Thetangible expression of the audio integrity and the relativephase based random embedding scheme are the highlight ofour work

The simulation experiments show that the RP-AIPmodelcan guarantee the SNRof the audio signal and the embeddingof IO will not cause signal distortion In addition it hasgood adaptability to some common audio processing suchas smoothinglow-pass filtering band-pass filtering MPEG-1 compression with different ratios and AD conversionHowever aiming at malicious attacks such as cutting tam-pering and resampling the RP-AIP model shows obvioussuperiority to other similar protection solutions Of coursethe research about audio integrity protection is still in itsinfancy and more work will be done to make it morecomplete

Data Availability

Some of the data is uploaded to the public server httpsdoiorg106084m9figshare6382709v1 In the file you canfind the demo audio and its spectrum data the IO data andthe audio data used in the experiment

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work is supported by the National Natural ScienceFoundation of China [No 61671141 No 61401081] the Liaon-ing Provincial Natural Science Foundation of China [No20180551007] and the Ministry of Education-China MobileScientific Research Funds [No MCM20150103]

Security and Communication Networks 11

References

[1] R F Olanrewaju and O Khalifa ldquoDigital audio watermark-ing techniques and applicationsrdquo in Proceedings of the 2012International Conference on Computer and CommunicationEngineering ICCCE 2012 pp 830ndash835 Malaysia July 2012

[2] J Seok J Hong and J Kim ldquoA novel audio watermarkingalgorithm for copyright protection of digital audiordquo ETRIJournal vol 24 no 3 pp 181ndash189 2002

[3] H Yassine B Bachir and K Aziz ldquoA Secure and HighRobust AudioWatermarking System for Copyright ProtectionrdquoInternational Journal of Computer Applications vol 53 no 17pp 33ndash39 2012

[4] J Haitsma M van der Veen T Kalker and F BruekersldquoAudio watermarking for monitoring and copy protectionrdquo inProceedings of the ACM Workshops on Multimedia ACM pp119ndash122 2000

[5] R D Shelke and M U Nemade ldquoAudio watermarking tech-niques for copyright protection A reviewrdquo in Proceedings ofthe 2016 International Conference on Global Trends in SignalProcessing Information Computing and Communication ICGT-SPICC 2016 pp 634ndash640 December 2016

[6] S V Dhavale R S Deodhar and L M Patnaik ldquoHighCapacity Lossless Semi-fragile Audio Watermarking in theTimeDomainrdquo inAhierarchical CPNmodel formobility analysisin zone based MANET pp 843ndash852 2012

[7] S V Dhavale R S Deodhar D Pradhan and L M PatnaikldquoState Transition Based Embedding in Cepstrum Domain forAudio Copyright Protectionrdquo IETE Journal of Research vol 61no 1 pp 41ndash55 2015

[8] HH Tsai J S Cheng and P T Yu ldquoAudioWatermarkingBasedonHAS andNeuralNetworks inDCTDomainrdquoEurasip Journalon Advances in Signal Processing vol 3 no 2 pp 1ndash12 2003

[9] Y Xiang I Natgunanathan Y Rong and S Guo ldquoSpreadspectrum-based high embedding capacity watermarkingmethod for audio signalsrdquo IEEE Transactions on Audio Speechand Language Processing vol 23 no 12 pp 2228ndash2237 2015

[10] X Huang A Nishimura and I Echizen ldquoA Reversible AcousticSteganography for Integrity Verificationrdquo inDigital Watermark-ing p 305 Springer Berlin Heidelberg Germany 2011

[11] H Cao Y Wang J Li et al ldquoAudio data hiding using perceptualmasking effect of HASrdquo in Proceedings of the InternationalSymposium on Multispectral Image Processing and PatternRecognition International Society for Optics and Photonics 2007

[12] T Painter and A Spanias ldquoPerceptual coding of digital audiordquoProceedings of the IEEE vol 88 no 4 pp 451ndash512 2000

[13] S V Dhavale R S Deodhar D Pradhan et al ldquoRobustMultiple Stereo Audio Watermarking for Copyright Protectionand Integrity Checkingrdquo in Proceedings of the InternationalConference on Computational Intelligence and Information Tech-nology IET pp 9ndash16 2014

[14] L Boney A H Tewfik and K N Hamdy ldquoDigital watermarksfor audio signalsrdquo in Proceedings of the IEEE InternationalConference onMultimedia Computing and Systems pp 473ndash480IEEE 2002

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 2: A Relative Phase Based Audio Integrity Protection Method: Model …downloads.hindawi.com/journals/scn/2018/7139391.pdf · 2019-07-30 · ResearchArticle A Relative Phase Based Audio

2 Security and Communication Networks

How to design such a scheme that can accompany the mediatransmission and unlink the global association is the firstessential issue to consider

(ii) ImperceptibilityThis characteristic is to ensure that the quality of expe-

rience (QoE) of users will not be degraded Comparing tothe human visual system (HVS) the human auditory system(HAS) is more sensitive and easy to perceive subtle changesin the audio signal This is also why audio signal processing ismore difficult than image and video signals Time-domain-oriented signal processing method tends to have bettersynchronization but at the same time it is rarely usedbecause of its serious signal distortionThis characteristic justcontradicts the requirements of the previous one which isanother new difficulty we need to consider

(iii) Self-DetectabilityAs streaming media audio signal cannot stay while lis-

tening while disappearing Therefore the integrity detectionshould be self-fulfilled without relying on the original signalThat is to say the integrity protection and detection processshould be synchronized aswell while playingwhile detecting

(iv) UnderdeterminationDue to the unpredictability of transmission environment

and noise effects we do not emphasize the complete certaintyof audio integrity protection More specifically we believethat the audio oriented integrity protection does not haveto be completely determined This is because our funda-mental goal of audio integrity protection is to ensure thecorrectness and credibility of the audio information (eg thespeakerrsquos intention) rather than the audio signal mediumitself

For these reasons a new method of audio integrityprotection based on relative phase (RP-AIP) is proposed inthis paper which is used for protecting audio from beingmaliciously cut or tampered and ensuring the audio informa-tion can be correctly conveyed Taking into account the abovefeatures we first divided the audio into uniform segmentsand sampled them to get discrete audio signals Then wetransformed them by DFT (Discrete Fourier Transform) andDCT (Discrete Cosine Transform) respectively In the DFTdomain we can get the relative phase relation of the hostsegment and use this as a rule In the DCT domain we willembed a series of constructed eigenvalues as the integrityobject (IO) into each segment furtherly By this method theintegrity of the audio can be abstracted as the completenessand accuracy of the IO in each segment This model hasbroken the direct relationship between integrity and audiomedia so as to achieve the purpose of synchronous detectionof streaming media

The remainder of this paper is organized as follows Weintroduced some related technologies about audio maskingeffect and the noise model in DCT Then we illustrated themodel and analysis of RP-AIP in Section 3 After that wepresented the process and strategy of RP-AIP and describedthe details of them in Section 4 In Section 5 we illustratedthe feasibility of our model with simulation experiments

Finally Section 6 concluded the paper and identified futuredirections

2 Related Technologies

21 Audio Masking Effect The HAS can be seen as a set offrequency analysis systems that contains about 26 band-passfilters which can distinguish about 20Hz to 20kHz HoweverHAS is difficult to distinguish between adjacent frequenciesthat is if a weak sound distributes in the adjacent frequencyof a strong sound the strong one will mask the weak one [11]This is the audio masking effect and the so-called adjacentfrequency is called the critical band with ldquoBarkrdquo unit Theaudio masking effect can be described by masking function(MF) which is related to the sound pressure (SP) of the audioand the distance (d) between the masked and the masker

119878119875 (119896) = 10 lg[[11198731003817100381710038171003817100381710038171003817100381710038171003817119873minus1sum119894=0

119904 (119894) ℎ (119894) exp(minus1198952120587 119894119896119873)10038171003817100381710038171003817100381710038171003817100381710038172]] (1)

where the SP(k) is the sound pressure s(i) is a frame of audiosignal N is the number of samples per frame and the h(i)is a weighted Hanning window as shown in the followingequation

ℎ (119894) = radic83 sdot 12 [1 minus cos (2120587119894119873 )] 119894 isin [0119873 minus 1] (2)

Obviously the closer to the masked and masker thegreater the masking effect Otherwise the weaker the mask-ing effect is until the masked is out of the critical band ofthemasker Furthermore themasking function of themaskercan be expressed as shown in (3) It can be seen that themasker has no effect on audio which is outside [-3 8] Bark

119872119865(119894) =

0 119889 isin (minusinfin minus3)17119889 minus 04119878119875 (119894) + 11 119889 isin [minus3 minus1)[04119878119875 (119894) + 6] 119889 119889 isin [minus1 0)minus17119889 119889 isin [0 1)minus (119889 minus 1) [17 minus 015119878119875 (119894)] 119889 isin [1 8]0 119889 isin (8 +infin)

(3)

Audio masking effect is the basis of audio embeddingThrough it the embedding data can be embeddedwithout thehuman ear being aware of it Therefore the embedding datashould depend on the original audio and the embedded dis-tribution must be determined by the masking characteristicof the original audio signal

22 NoiseModel inDCT TheDCT transform can convert thefrequency of the original audio signal to DCT coefficients asshown in (4) [12] The embedding of weak signals on strongsignals can be regarded as changes inDCT coefficients so the

Security and Communication Networks 3

essence of the embedding data is equivalent to an additivenoise

119878 (119896) = 120575 (119906)119873minus1sum119899=0

119904 (119899) cos((2119899 + 1) 1198961205872119873 )119904 (119899) = 119873minus1sum

119896=0

120575 (119906) 119878 (119896) cos((2119899 + 1) 1198961205872119873 )119899 119896 isin [0119873 minus 1]

(4)

where s(n) is the discrete sequence of audio signal in timedomain S(k) is the corresponding DCT coefficient sequenceN is the number of samples and 120575(119906) is a weight factor asshown in the following equation

120575 (119906) = radic 1119873 119906 = 0radic 2119873 119900119905ℎ119890119903119908119894119904119890 (5)

Assuming that the data embedded on the i-th coefficientof S(k) is E(i) then the inverse transform signal 1199041015840(119899) canbe calculated as shown in (6) In this way we can find thechanges between the original and transformed signals

1199041015840 (119899) = 119894minus1sum119896=0

120575 (119906) 119878 (119896) cos((2119899 + 1) 1198961205872119873 ) + 120575 (119894) [119878 (119894)+ 119864 (119894)] cos ((2119899 + 1) 1198941205872119873 ) + 119873minus1sum

119896=119894+1

120575 (119906) 119878 (119896)sdot cos((2119899 + 1) 1198961205872119873 ) (2119899+1)1205872119873≜Ψ997888997888997888997888997888997888997888997888997888997888997888rarr

= [ 119894minus1sum119896=0

120575 (119906) 119878 (119896) cos (119896Ψ) + 120575 (119894) 119878 (119894) cos (119894Ψ)+ 119873minus1sum119896=119894+1

120575 (119906) 119878 (119896) cos (119896Ψ)] + 120575 (119894) 119864 (119894) cos (119894Ψ)= 119873minus1sum119896=0

120575 (119906) 119878 (119896) cos (119896Ψ) + 120575 (119894) 119864 (119894) cos (119894Ψ)= 119904 (119899) + 120575 (119894) 119864 (119894) cos (119894Ψ)

(6)

Let 1199041015840(119899) minus 119904(119899) = 120575(119894)119864(119894) cos(119894Ψ) ≜ 120576(119894 119899) then the noiseeffect in the time domain can be expressed as

120576 (119894 119899) = radic 1119873119864 (119894) 119894 = 0radic 2119873119864 (119894) cos (119894Ψ) 119900119905ℎ119890119903119908119894119904119890 (7)

whereΨ = (2119899 + 1)1205872119873It can be seen from (6) and (7) that in the DCT domain

the noise effect is only related to the embedded coefficient

Thus we can evaluate the distortion of the original signalcaused by the embedding property

119873minus1sum119899=0

120576 (119894 119899)119904 (119899) =

119873minus1sum119899=0

radic1119873119864 (119894)119904 (119899) 119894 = 0119873minus1sum119899=0

radic2119873119864 (119894) cos (119894Ψ)119904 (119899) 119900119905ℎ119890119903119908119894119904119890= radic 2119873 sdot 119873minus1sum

119899=0

119864 (119894)119904 (119899) sdot 1radic2 119894 = 0

cos (119894Ψ) 119900119905ℎ119890119903119908119894119904119890

(8)

In (8) it can be seen that sum119873minus1119899=0 (120576(119894 119899)119904(119899)) prop 119864(119894)119904(119899)and the scaling factor can be furtherly defined as 120588(119894) in thefollowing equation

120588 (119894) = 1radic2 119894 = 0119873minus1sum119899=0

cos (119894Ψ) 119900119905ℎ119890119903119908119894119904119890 (9)

whereΨ = (2119899 + 1)1205872119873So far we learn that 120588(119894) is the final effect caused by

embedding on the audio signal and is also the sign of signaldistortion

3 Model and Analysis of RP-AIP

31 Mathematical Modeling According to the audio noisemodel we can establish the mathematical model as shown in(10) We expect to find such a binding relation that associatesthe IO with the audio signal Then the integrity of the hostaudio will be extracted and expressed accordingly

exist otimes119904119905 1198781015840 = 119878 otimes 119864

119864 larr997888 integrity of 119878(10)

where S refers to the original audio E refers to the IOand 1198781015840 refers to the transformed audio otimes here means asort of binding relation we expected which will be furtherintroduced later In addition E should be unperceivable butrobust to ensure that S will not be distorted

32 Model Structure The overall structure of the RP-AIPmodel is shown in Figure 1 and it consists of four parts Asignal processing part B DFT part C DCT part and D IOgeneration part Here we would like to apply the randomembedding approach as the binding relationThe audio signalis evenly divided into segments in Part A and the segment isthe basic unit for subsequent operations Then perform DFTand DCT transforms on one audio segment in Parts B and Crespectively The DFT transform is used to obtain the relativephase relation of the extreme points in its phase spectrumand the DCT coefficient matrix is the final embedded targetThe eigenvalues refer to the key encapsulated in the form offuzzy vault which are generated in Part D

4 Security and Communication Networks

Key Fuzzy vault

D IO generation

DFT transformation

Relative phase relation

Divide into segments

Coefficient matrix

DCT transformation

C DCTB DFT

A Signal processing

Eigenvalues

Integrity object (IO)

Sampling

Figure 1 The overall structure of the RP-AIP model The eigenvalues generated by the key and fuzzy vault are embedded into the DCTcoefficient matrix controlled by the relative phase relation indicated from DFT phase spectrum

Since the segments are evenly divided the IO can beequally distributed in the audio data and the size of the frag-ments can be adjusted by actual requirements In this way theintegrity of the audio can be abstracted as the completenessand accuracy of the IO by uniform embedding

In addition it is necessary to explain the function of PartD Considering the issues of network congestion accidentalpacket loss and so forth caused by network anomaliesthe received IO may not be exactly the same with theoriginal However the loss and damage of IO caused bythe transmission anomalies are generally accidental andrandom but when it comes to malicious attack it is generallydeterministic and directional Moreover as stated earlier thefundamental goal of audio integrity protection is to ensure thecorrectness and credibility of the audio information ratherthan the audio signal medium itself To this end the use offuzzy vault can solve this embarrassment because the fuzzyvault can guarantee certain similarity rather than completecertainty

33 Embedding Analysis Through the introduction and anal-ysis of audio integrity protection characteristics we know thataudio embedding should be transparent to the user On the

other hand the distortion tolerance characteristic requiresthat the embedding process should have some robustness[13] Interestingly these two requirements are contradictoryto ensure transparency it often means that the operation isusually based on the unimportant components of the audiosignal which are not sensitive to human ear such as thehigh frequency regions However robustness is generallydependent on the important components of the audio signalwhich are often sensitive to human ear such as the lowfrequency regions [14]

After the audio signal sampled it is transformed into anonperiodic discrete signal in the time domain and thena set of coefficient matrices composed of DCT coefficientscan be obtained by the DCT transformation Here the lowfrequency signal energy is gathered in the upper left cornerof the matrix and the high frequency energy distributed inthe lower right corner region is almost zero Therefore theeigenvalues can be embedded in these two regions the lowfrequency region is robust because it carries a large amountof the main signal energy while the high frequency regionhas good invisibility because of the absence of perceptionHere comes the question what kind of embedding strategyis the fairest To this end we propose to use the relative

Security and Communication Networks 5

phase relation characteristic of the audio signalrsquos frequencyspectrum to control the embedding position

4 Process and Strategy of RP-AIP

41 Audio Segmentation According to the scheme designconcepts we need to embed the eigenvalues evenly in theaudio signal so this requires that the audio signal data shouldbe divided evenly in advance

The number of segments can be determined based on theelements of IO and the size of the audio In general assumingthat there are n elements in the IO then we will divide theaudio signal data into n segments as well Thus it can beensured that the whole IO can be completely embedded andbound with the audio

42 Relative Phase Invariance Through the early part welearnt that the human ear is very sensitive to relative phasechanges of audio signal but lacks the ability to perceiveand distinguish the absolute phase Therefore for a certainaudio signal the phase component is more important thanthe amplitude component and any change or damage tothe phase component will cause unacceptable distortionof the audio quality or even completely destroy the audioinformation At the same time the communication theoryalso declares that the phasemodulation has strong robustnessto the noise signal

More specifically by DFT transform the phase distri-bution of a segment of the audio signal is easy to obtainThere is only one maximum point and one minimum pointin the phase spectrum as shown in Figure 2 and their relativepositions are fixed (if there are multiple extreme points thefirst one prevails) Asmentioned above the relative phase willmaintain consistency regardless of any transformation Thischaracteristic offers the possibility to random embedding ofthe IO

43 Random Embedding Strategy We will propose a strat-egy to determine the embedding position according to theposition of the extreme points in the phase spectrum Sincethe extreme points are random in the phase spectrumthe choice of the embedding position controlled by themis also random Therefore this strategy will guarantee themutual balance between the transparency and robustnessas previously described Furthermore the characteristic ofrelative phase invariance in frequency spectrum is uniquebecause only the audio segment is involved which guaranteesthe self- detectability as well

An 8times8 coefficient matrix D can be obtained from aDCT transformation of an audio signal s(n) as shown inFigure 3Themost robust embedding method recommendedis embedding at Position (00) while the most transparentone is at Position (77) Embedding at Position (00) canimprove robustness but adds more perceptual distortionwhile embedding at Position (77) can improve impercepti-bility but offers less robustness

To get a good compromise between these parametersthe mid-band coefficients such as (22) (L) and (44) (H) are

0 00003 00006 00009 00012 00015 00018 00021 00024 00027 00030time (s)

0 00003 00006 00009 00012 00015 00018 00021 00024 00027 00030time (s)

radi

an (r

ad)

maxima phase spectrum

minima

37500

30000

22500

15000

7500

0

minus7500

minus15000

minus22500

minus30000

minus37500

ampl

itude

(kH

z)

4

3

2

1

0

minus1

minus2

minus3

minus4

Figure 2The phase spectrum of a segment of the audio signal Theaudio sampling frequency is 44100kHz and the length is 05 secondsThe length of the intercepted fragment is 0003 seconds containing120 sampling points

L

H

0 1 2 3 4 5 6 7

0

1

2

3

4

5

6

7

Figure 3 An 8times8 coefficient matrix of DCT Position L representsthe low frequency region and position H represents the highfrequency region which cannot be perceived by human ear

6 Security and Communication Networks

(1) receiving the key vaults as kv (i)(2) set the number of kv (i) as n(3) divide the audio signal equally into n segments as s (j)(4) sampling the s (j) to c (j)(5) forallc(j) initialize j=1(6) while end do(7) if jlen then(8) calculate the phase spectrum 120593(120596) of s(j)(9) determine the position of the maxima and minima in 120593(120596)(10) calculate the coefficient matrix C (j) by 8 times 8 DCT of c(j)(11) if the maxima is in front of the minima then(12) embedding the kv (j) at C (22)(13) else(14) embedding the kv (j) at C (44)(15) end if(16) end if(17) end while

Algorithm 1 The algorithm of random embedding strategy based on relative phase

START

Have all the elements ofIO been embedded

Calculate the phase spectrum ofthe audio segment by DFT

Determine the relative positionof the maxima and minima

Is the maxima in frontof the minimum

Embed the elementat L

Embed the elementat H

Y N

N

Y

END

Figure 4 The flow chart of the random embedding strategy Thechoice of the embedding position between L and H is controlled bythe relative phase relation of the host audio segment

selected as the embedding position instead Position L or His the Boolean choice for random embedding The so-calledBoolean choice means that a choice must be made from thesetwo positions and the selection process is random Basedon the relative phase relation discussed above a randomembedding strategy is proposed as follows The flow chartof this strategy is shown in Figure 4 and its algorithm isillustrated in Algorithm 1

(i) If the maximum point of the audio segment appearsbefore the minimum point the corresponding eigen-value will be embedded into the low frequency region(Position L)

(ii) Otherwise the corresponding eigenvalue will beembedded into the high frequency region (PositionH)

After determining the embedding position the next stepis to construct a fuzzy vault containing the secret sequenceOn the finite field F the secret sequence key isin F By anencrypting set P (P isin F) the key can be packaged into thevault of P to generate the IO of the sender

After embedding the IO into the audio a new DCTcoefficient matrix 1198631015840 is generated By making a DCT inversetransformation on1198631015840 we can obtain a new audio signal 1199041015840(119899)containing IO as well Finally 1199041015840(119899) is the transformed audiosignal that we exactly expect

44 Detection and Extraction The detection and extractionof IO can be achieved by the difference between D and 1198631015840When the receiver receives IO it is essentially a copy Qof the fuzzy vault P containing the secret sequence On thefinite field F by matching the Q (QisinF) and P the key can beparsed from the vault to obtain the IO of the receiver Afterthe IO are recovered the integrity of the audio signal canbe determined according to the completeness and accuracyof them furtherly Figure 5 shows the reconstruction anddetection process

In Figure 5 (a) shows the reconstruction process of thehost audio and 1199041015840(119899) is the reconstructed one which containsthe IO (b) is the detection and extraction process of IO119890V is the eigenvalues and 119864V is its DCT transformation 1198901015840V isthe extracted eigenvalues and 1198641015840V is its DCT transformationBy matching the similarity between 1198901015840V and 119890V the similaritybetween the received and original audio can be evaluatedAccording to the features of the extracted IO it can be found

Security and Communication Networks 7

Sampling DCT

DCT

Inverse

Reconstruction

Detection

DCT

DCT DCT

Inverse

s(i) s(n)

s(n)

D

D

D

D

s(n)

(a)

(b)

s(n)

E

E e

e

Figure 5 The process of the reconstruction and detection of theeigenvalues (a) is the reconstruction process and (b) is the detectionprocess

whether the audio is cut or tampered More details can befound in the experimental section

45 Integrity Evaluation After the IO is extracted thesimilarity of 119890V and 1198901015840V can be calculated by their cosinedistance d and the integrity of the host audio 119868au can beevaluated based on this as shown in (11) It can be seen thatwhen 120575 takes 1 119868au is a decimal within (0 1) The greater thevalue of 119868au the higher the similarity also the better the audiointegrity

119868119886119906 = 120575 sdot 119889 = 120575 sdot sum119873minus1119894=0 sum119873minus1119894=0 119890 (119894) 1198901015840 (119894)radicsum119873minus1119894=0 1198902 (119894) sdot radicsum119873minus1119894=0 11989010158402 (119894) (11)

where d is the cosine distance of 119890V and 1198901015840V 120575 is a weightcoefficient and the default value is 1 119890(119894) is the i-th elementof 119890V and 1198901015840(119894) is the i-th element of 1198901015840V5 Experiments and Evaluations

51 Audio Quality Criterion The signal to noise ratio (SNR)is the widely approved and used audio quality criterion forthe evaluation of audio signal transformation as shown in thefollowing equation

SNR = 20 logradicsum119873minus1119899=0 1199042 (119899)radicsum119873minus1119899=0 [119904 (119899) minus 1199041015840 (119899)]2

(12)

Figure 6 The black-and-white image of 100times100 pixels as IO inthese experiments

where 1199041015840(119899) is the transformation of 119904(119899)SNR = 20 log

radicsum119873minus1119899=0 1199042 (119899)radicsum119873minus1119899=0 [119904 (119899) minus 1199041015840 (119899)]2

Eq(6)Eq(7)997888997888997888997888997888997888997888997888997888rarr

= 20 log[[[radicsum119873minus1119899=0 1199042 (119899)radicsum119873minus1119899=0 1205762 (119894 119899)

]]]= 20 logradic sum119873minus1119899=0 1199042 (119899)sum119873minus1119899=0 1205762 (119894 119899)= 20 logradic119873minus1sum

119899=0

119873minus1sum119899=0

1199042 (119899)1205762 (119894 119899) Eq(9)997888997888997888997888rarr

= 20 logradic119873minus1sum119899=0

11205882 (119894)

(13)

Therefore after embedding the IO on the i-th coefficientof s(n) the SNR can be calculated according to (6) (7) and(9) as shown in (13)

From (11) we can see that the SNR is only related to 120588(119894)that is to say the audio quality after embedded eigenvalues isonly related to the embedding position of its DCT coefficienti This proves that the embedding position is crucial to theimperceptibility and robustness of the host audio in ourRP-AIP model Therefore the embedding rule based on therelative phase of the audio signal itself is of extraordinarysignificance

52 Embedding and Extraction of IO In this experiment itis assumed that the IO is a black-and-white image as shownin Figure 6 and the host audio is a set of audio clips ofthe left channel as shown in Figure 7 and Table 1 Firstlywe uniformly embedded the IO in the host audio and thenrecovered it in the opposite way

According to (12) the SNR of different audio types iscalculated as shown in Table 1 which is in line with therequirements of robustness and imperceptibility

53 Integrity Protection Performance In the transmissionprocess the audio may suffer from network congestionaccidental packet loss and other accidents caused by network

8 Security and Communication Networks

Table 1 The SNR of different audio types after embedding IO

Audio clipsA1 A2 A3

Audio types A segment of voice message A prompt tone of Windows 10 A piece of musicSNR 3096 4233 4640

norm

aliz

ed am

plitu

de

0 02 04 06 08 10

0 02 04 06 08 10

time (s)

0 02 04 06 08 10

1

1

1

05

05

05

minus05

minus05

minus05

minus1

minus1

minus1

0

0

0

A3)(

A2)(

A1)(

Figure 7 The host audio clips in the experiment The audiosampling frequency is 22050 kHz and the length is 10 seconds (A1)a segment of voice message (A2) a prompt tone of Windows 10(A3) a piece of music

anomalies even filtering compression and AD conversionTherefore the first thing to consider is whether the model caneffectively distinguish between such accidents and maliciousattacks such as cutting and tampering As is stated above theloss and damage of IO caused by the transmission anomaliesare generally accidental and random but deterministic anddirectional caused by malicious attacks Thus this issue canbe judged from the recovered IO

531 Audio Processing In this experiment we simulated fivecommon audio processing methods smoothinglow-pass fil-tering (with 4kHz cutoff frequency) band-pass filtering (with200-2kHz cutoff frequency) MPEG-1 (with compressionratio of 1051 and 121) compression and AD conversionThe evaluation of audio signal distortion is indicated by thesimilarity between the original and recovered IO as well

Figure 8 shows the simulation results Among them (a)is the recovered IO after a smoothinglow-pass filtering (b)is the one after a band-pass filtering (c) is the one after aMPEG-1 (with compression ratio of 1051) compression (d)

(a) (b) (c)

(d) (e)

Figure 8 The simulation results of the recovered IO of differentaudio processing methods

is the one after a MPEG-1 (with compression ratio of 121)compression and (e) is the one after an AD conversion Thesimilarity between the original and recovered IO and thecomparison with the other two common audio embeddingmethods (embedding the IO on 0times0 or 119899 times 119899 coefficientwhich is called 0-coefficient or n-coefficient embedding forshort) can be found in Table 2 and presented by Figure 9

It can be seen from Figure 8 and Table 2 that the RP-AIPmodel can well adapt to the common audio processing thatis to say the general audio processing method will not lead tothe destruction and loss of IO It can be seen from Figure 9that the RP-AIP model has similar robustness against thecommon audio processing compared with the 0-coefficientembedding model In addition as far as the smoothinglow-pass and band-pass filtering methods are concerned theperformance of band-pass filtering of RP-AIP is superior toother models significantly thanks to the method of randomembedding in the mid-band coefficient In general RP-AIPexhibits sufficient robustness for general audio processingmethods

532 Malicious Attack In this experiment we simulatedthree kinds of malicious attacks as cutting tampering andresampling Moreover in the type of tampering it canbe divided into self-media tampering and non-self-mediatampering self-media tampering refers to using the mediaitself for shifting swapping and so forth while non-self-media tampering refers to using other media for replacementcoverage and so forth Resampling can also be dividedinto upsampling and downsampling upsampling refers to

Security and Communication Networks 9

Table 2 The similarity 120588 between the original and recovered IO in audio processing experiments

No Processing methodsSimilarity120588

Random embedding (RP-AIP) 0-coefficient embedding n-coefficient embedding(a) Original 100 100 100(b) Without processing 097 099 095(c) Smoothinglow-pass filtering 088 098 018(d) Band-pass filtering 090 085 084(e) MPEG-1 (with 1051 ratio) 079 088 068(f) MPEG-1 (with 121 ratio) 081 085 064(g) AD conversion 075 074 076

Processing methods

0

05

1

15

2

25

3

RP-AIP0-coefficientn-coefficient

095

099

097

018

098

088 090

085

084068

088

079 081

085

064 076

074

075

W LP BP MP(i) MP(ii) AD

Figure 9 The comparison of the simulation results of differentaudio processing and embedding methods W without processingLP smoothinglow-pass filtering BP band-pass filtering MP (i)MPEG-1 compression (with compression ratio of 1051) MP (ii)MPEG-1 compression (with compression ratio of 121) AD ADconversion

interpolation expansion of the original audio while down-sampling refers to the extraction of the original audio Cuttingand tampering are undoubtedly malicious attacks on audioand attempt to change the speakerrsquos intention Howeverresampling attacks often only affect the length or quality ofthe audio especially the upsampling but cannot change thespeakerrsquos intention

Figure 10 shows the simulation results Among them (f)is the recovered IO after a cutting attack (g) is the one aftera self-media tampering attack (h) is the one after a non-self-media tampering attack (i) is the one after an upsamplingattack and (j) is the one after a downsampling attack Thesimilarity between the original and recovered IO and thecomparison with the other two common audio embeddingmethods as mentioned above can be found in Table 3 andpresented by Figure 11

As can be seen from Figure 10 and Table 3 in additionto the upsampling attack the RP-AIP model can effectively

(f)(g)(f) (h)

(i) (j)

Figure 10 The simulation results of the recovered IO of differentmalicious attacks

Attack types

0

05

1

15

2

25

3

097

099

095

050

044

031

062

046

024

057

033019

092

098

079022

042

020

RP-AIP0-coefficientn-coefficient

W CT TP(i) TP(ii) RS(i) RS(ii)

Figure 11 The comparison of the simulation results of differentmalicious attacks and embedding methods W without processingCT cutting attack TP (i) self-media tampering attack TP (ii)non-self-media tampering attack RS (i) upsampling attack RS (ii)downsampling attack

10 Security and Communication Networks

Table 3 The similarity 120588 between the original and recovered IO in malicious attack experiments

No Attack typesSimilarity120588

Random embedding (RP-AIP) 0-coefficient embedding n-coefficient embedding(a) Original 100 100 100(b) Without attacks 097 099 095(h) Cutting 031 044 050(i) Tampering (self-media) 024 046 062(j) Tampering (non-self-media) 019 033 057(k) Resampling (upsampling) 097 098 092(l) Resampling (downsampling) 020 042 022

detect and reflect malicious attacks As stated earlier the IOis the tangible representation form of the audio integritythat is changes in IO directly reflect changes in audiomedia Therefore we can evaluate the type and degree ofthe attack on the audio by observing IO In the case of (f)since cutting will directly lead to the loss of IO when theIO is reconstructed by 100times100 pixels it will become anincomplete image The missing part of IO indicates that thecorresponding audio part has been cut off In the same waythe tampered part appears as random noise or disorderedas shown in (g) and (h) because the audio has suffered thecorresponding tampering attack However for resamplingattacks the RP-AIP shows different results as shown in (i)and (j) This is because the upsampling tends to interpolatethe original audio signal without causing loss of IO whiledownsampling is the extraction of the original audio signalwhich directly leads to the loss and destruction of IO Asstated in the Introduction we are more concerned aboutwhether the attack has tampered with the speakerrsquos intentionbut resampling is generally not

It can be seen from Figure 11 that the RP-AIP modelhas definite better performance than the other two modelsThis is because attacks such as cutting and tampering aregenerally oriented to the audio signalrsquos time domain ratherthan the frequency domain so the damage to the low andhigh frequency parts is random The embedding methodwith only consideration of low or high frequency can onlyguarantee one aspect while the random embedding methodin RP-AIP model can synthesize these two frequency partsThis is also the innovation and highlight of our work

6 Conclusion

Audio integrity protection for network interaction envi-ronment is not only an effective way to protect audioinformation but also an important link in building trustedcommunication Due to real-time constraints of interactionscenarios this kind of audio protection needs to reconsidersome new characteristics such as streaming characteristicimperceptibility self-detectability and distortion tolerancewhich are not covered by the traditional programsThis posesnew challenges to the existing audio protection solutions

To this end a method of audio integrity protection basedon relative phase (RP-AIP) is proposed in this work In

the RP-AIP model we established the concept of integrityobject (IO) in order to propose and abstract the integrityof the audio signal and then transform it into a tangiblerepresentation form In addition we also fully consideredthe characteristics of the audio signal in the DFT and DCTtransform domain and used the frequency characteristicsof the audio itself to guide the random embedding of IOthereby achieving blind detection and extraction of it Thetangible expression of the audio integrity and the relativephase based random embedding scheme are the highlight ofour work

The simulation experiments show that the RP-AIPmodelcan guarantee the SNRof the audio signal and the embeddingof IO will not cause signal distortion In addition it hasgood adaptability to some common audio processing suchas smoothinglow-pass filtering band-pass filtering MPEG-1 compression with different ratios and AD conversionHowever aiming at malicious attacks such as cutting tam-pering and resampling the RP-AIP model shows obvioussuperiority to other similar protection solutions Of coursethe research about audio integrity protection is still in itsinfancy and more work will be done to make it morecomplete

Data Availability

Some of the data is uploaded to the public server httpsdoiorg106084m9figshare6382709v1 In the file you canfind the demo audio and its spectrum data the IO data andthe audio data used in the experiment

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work is supported by the National Natural ScienceFoundation of China [No 61671141 No 61401081] the Liaon-ing Provincial Natural Science Foundation of China [No20180551007] and the Ministry of Education-China MobileScientific Research Funds [No MCM20150103]

Security and Communication Networks 11

References

[1] R F Olanrewaju and O Khalifa ldquoDigital audio watermark-ing techniques and applicationsrdquo in Proceedings of the 2012International Conference on Computer and CommunicationEngineering ICCCE 2012 pp 830ndash835 Malaysia July 2012

[2] J Seok J Hong and J Kim ldquoA novel audio watermarkingalgorithm for copyright protection of digital audiordquo ETRIJournal vol 24 no 3 pp 181ndash189 2002

[3] H Yassine B Bachir and K Aziz ldquoA Secure and HighRobust AudioWatermarking System for Copyright ProtectionrdquoInternational Journal of Computer Applications vol 53 no 17pp 33ndash39 2012

[4] J Haitsma M van der Veen T Kalker and F BruekersldquoAudio watermarking for monitoring and copy protectionrdquo inProceedings of the ACM Workshops on Multimedia ACM pp119ndash122 2000

[5] R D Shelke and M U Nemade ldquoAudio watermarking tech-niques for copyright protection A reviewrdquo in Proceedings ofthe 2016 International Conference on Global Trends in SignalProcessing Information Computing and Communication ICGT-SPICC 2016 pp 634ndash640 December 2016

[6] S V Dhavale R S Deodhar and L M Patnaik ldquoHighCapacity Lossless Semi-fragile Audio Watermarking in theTimeDomainrdquo inAhierarchical CPNmodel formobility analysisin zone based MANET pp 843ndash852 2012

[7] S V Dhavale R S Deodhar D Pradhan and L M PatnaikldquoState Transition Based Embedding in Cepstrum Domain forAudio Copyright Protectionrdquo IETE Journal of Research vol 61no 1 pp 41ndash55 2015

[8] HH Tsai J S Cheng and P T Yu ldquoAudioWatermarkingBasedonHAS andNeuralNetworks inDCTDomainrdquoEurasip Journalon Advances in Signal Processing vol 3 no 2 pp 1ndash12 2003

[9] Y Xiang I Natgunanathan Y Rong and S Guo ldquoSpreadspectrum-based high embedding capacity watermarkingmethod for audio signalsrdquo IEEE Transactions on Audio Speechand Language Processing vol 23 no 12 pp 2228ndash2237 2015

[10] X Huang A Nishimura and I Echizen ldquoA Reversible AcousticSteganography for Integrity Verificationrdquo inDigital Watermark-ing p 305 Springer Berlin Heidelberg Germany 2011

[11] H Cao Y Wang J Li et al ldquoAudio data hiding using perceptualmasking effect of HASrdquo in Proceedings of the InternationalSymposium on Multispectral Image Processing and PatternRecognition International Society for Optics and Photonics 2007

[12] T Painter and A Spanias ldquoPerceptual coding of digital audiordquoProceedings of the IEEE vol 88 no 4 pp 451ndash512 2000

[13] S V Dhavale R S Deodhar D Pradhan et al ldquoRobustMultiple Stereo Audio Watermarking for Copyright Protectionand Integrity Checkingrdquo in Proceedings of the InternationalConference on Computational Intelligence and Information Tech-nology IET pp 9ndash16 2014

[14] L Boney A H Tewfik and K N Hamdy ldquoDigital watermarksfor audio signalsrdquo in Proceedings of the IEEE InternationalConference onMultimedia Computing and Systems pp 473ndash480IEEE 2002

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 3: A Relative Phase Based Audio Integrity Protection Method: Model …downloads.hindawi.com/journals/scn/2018/7139391.pdf · 2019-07-30 · ResearchArticle A Relative Phase Based Audio

Security and Communication Networks 3

essence of the embedding data is equivalent to an additivenoise

119878 (119896) = 120575 (119906)119873minus1sum119899=0

119904 (119899) cos((2119899 + 1) 1198961205872119873 )119904 (119899) = 119873minus1sum

119896=0

120575 (119906) 119878 (119896) cos((2119899 + 1) 1198961205872119873 )119899 119896 isin [0119873 minus 1]

(4)

where s(n) is the discrete sequence of audio signal in timedomain S(k) is the corresponding DCT coefficient sequenceN is the number of samples and 120575(119906) is a weight factor asshown in the following equation

120575 (119906) = radic 1119873 119906 = 0radic 2119873 119900119905ℎ119890119903119908119894119904119890 (5)

Assuming that the data embedded on the i-th coefficientof S(k) is E(i) then the inverse transform signal 1199041015840(119899) canbe calculated as shown in (6) In this way we can find thechanges between the original and transformed signals

1199041015840 (119899) = 119894minus1sum119896=0

120575 (119906) 119878 (119896) cos((2119899 + 1) 1198961205872119873 ) + 120575 (119894) [119878 (119894)+ 119864 (119894)] cos ((2119899 + 1) 1198941205872119873 ) + 119873minus1sum

119896=119894+1

120575 (119906) 119878 (119896)sdot cos((2119899 + 1) 1198961205872119873 ) (2119899+1)1205872119873≜Ψ997888997888997888997888997888997888997888997888997888997888997888rarr

= [ 119894minus1sum119896=0

120575 (119906) 119878 (119896) cos (119896Ψ) + 120575 (119894) 119878 (119894) cos (119894Ψ)+ 119873minus1sum119896=119894+1

120575 (119906) 119878 (119896) cos (119896Ψ)] + 120575 (119894) 119864 (119894) cos (119894Ψ)= 119873minus1sum119896=0

120575 (119906) 119878 (119896) cos (119896Ψ) + 120575 (119894) 119864 (119894) cos (119894Ψ)= 119904 (119899) + 120575 (119894) 119864 (119894) cos (119894Ψ)

(6)

Let 1199041015840(119899) minus 119904(119899) = 120575(119894)119864(119894) cos(119894Ψ) ≜ 120576(119894 119899) then the noiseeffect in the time domain can be expressed as

120576 (119894 119899) = radic 1119873119864 (119894) 119894 = 0radic 2119873119864 (119894) cos (119894Ψ) 119900119905ℎ119890119903119908119894119904119890 (7)

whereΨ = (2119899 + 1)1205872119873It can be seen from (6) and (7) that in the DCT domain

the noise effect is only related to the embedded coefficient

Thus we can evaluate the distortion of the original signalcaused by the embedding property

119873minus1sum119899=0

120576 (119894 119899)119904 (119899) =

119873minus1sum119899=0

radic1119873119864 (119894)119904 (119899) 119894 = 0119873minus1sum119899=0

radic2119873119864 (119894) cos (119894Ψ)119904 (119899) 119900119905ℎ119890119903119908119894119904119890= radic 2119873 sdot 119873minus1sum

119899=0

119864 (119894)119904 (119899) sdot 1radic2 119894 = 0

cos (119894Ψ) 119900119905ℎ119890119903119908119894119904119890

(8)

In (8) it can be seen that sum119873minus1119899=0 (120576(119894 119899)119904(119899)) prop 119864(119894)119904(119899)and the scaling factor can be furtherly defined as 120588(119894) in thefollowing equation

120588 (119894) = 1radic2 119894 = 0119873minus1sum119899=0

cos (119894Ψ) 119900119905ℎ119890119903119908119894119904119890 (9)

whereΨ = (2119899 + 1)1205872119873So far we learn that 120588(119894) is the final effect caused by

embedding on the audio signal and is also the sign of signaldistortion

3 Model and Analysis of RP-AIP

31 Mathematical Modeling According to the audio noisemodel we can establish the mathematical model as shown in(10) We expect to find such a binding relation that associatesthe IO with the audio signal Then the integrity of the hostaudio will be extracted and expressed accordingly

exist otimes119904119905 1198781015840 = 119878 otimes 119864

119864 larr997888 integrity of 119878(10)

where S refers to the original audio E refers to the IOand 1198781015840 refers to the transformed audio otimes here means asort of binding relation we expected which will be furtherintroduced later In addition E should be unperceivable butrobust to ensure that S will not be distorted

32 Model Structure The overall structure of the RP-AIPmodel is shown in Figure 1 and it consists of four parts Asignal processing part B DFT part C DCT part and D IOgeneration part Here we would like to apply the randomembedding approach as the binding relationThe audio signalis evenly divided into segments in Part A and the segment isthe basic unit for subsequent operations Then perform DFTand DCT transforms on one audio segment in Parts B and Crespectively The DFT transform is used to obtain the relativephase relation of the extreme points in its phase spectrumand the DCT coefficient matrix is the final embedded targetThe eigenvalues refer to the key encapsulated in the form offuzzy vault which are generated in Part D

4 Security and Communication Networks

Key Fuzzy vault

D IO generation

DFT transformation

Relative phase relation

Divide into segments

Coefficient matrix

DCT transformation

C DCTB DFT

A Signal processing

Eigenvalues

Integrity object (IO)

Sampling

Figure 1 The overall structure of the RP-AIP model The eigenvalues generated by the key and fuzzy vault are embedded into the DCTcoefficient matrix controlled by the relative phase relation indicated from DFT phase spectrum

Since the segments are evenly divided the IO can beequally distributed in the audio data and the size of the frag-ments can be adjusted by actual requirements In this way theintegrity of the audio can be abstracted as the completenessand accuracy of the IO by uniform embedding

In addition it is necessary to explain the function of PartD Considering the issues of network congestion accidentalpacket loss and so forth caused by network anomaliesthe received IO may not be exactly the same with theoriginal However the loss and damage of IO caused bythe transmission anomalies are generally accidental andrandom but when it comes to malicious attack it is generallydeterministic and directional Moreover as stated earlier thefundamental goal of audio integrity protection is to ensure thecorrectness and credibility of the audio information ratherthan the audio signal medium itself To this end the use offuzzy vault can solve this embarrassment because the fuzzyvault can guarantee certain similarity rather than completecertainty

33 Embedding Analysis Through the introduction and anal-ysis of audio integrity protection characteristics we know thataudio embedding should be transparent to the user On the

other hand the distortion tolerance characteristic requiresthat the embedding process should have some robustness[13] Interestingly these two requirements are contradictoryto ensure transparency it often means that the operation isusually based on the unimportant components of the audiosignal which are not sensitive to human ear such as thehigh frequency regions However robustness is generallydependent on the important components of the audio signalwhich are often sensitive to human ear such as the lowfrequency regions [14]

After the audio signal sampled it is transformed into anonperiodic discrete signal in the time domain and thena set of coefficient matrices composed of DCT coefficientscan be obtained by the DCT transformation Here the lowfrequency signal energy is gathered in the upper left cornerof the matrix and the high frequency energy distributed inthe lower right corner region is almost zero Therefore theeigenvalues can be embedded in these two regions the lowfrequency region is robust because it carries a large amountof the main signal energy while the high frequency regionhas good invisibility because of the absence of perceptionHere comes the question what kind of embedding strategyis the fairest To this end we propose to use the relative

Security and Communication Networks 5

phase relation characteristic of the audio signalrsquos frequencyspectrum to control the embedding position

4 Process and Strategy of RP-AIP

41 Audio Segmentation According to the scheme designconcepts we need to embed the eigenvalues evenly in theaudio signal so this requires that the audio signal data shouldbe divided evenly in advance

The number of segments can be determined based on theelements of IO and the size of the audio In general assumingthat there are n elements in the IO then we will divide theaudio signal data into n segments as well Thus it can beensured that the whole IO can be completely embedded andbound with the audio

42 Relative Phase Invariance Through the early part welearnt that the human ear is very sensitive to relative phasechanges of audio signal but lacks the ability to perceiveand distinguish the absolute phase Therefore for a certainaudio signal the phase component is more important thanthe amplitude component and any change or damage tothe phase component will cause unacceptable distortionof the audio quality or even completely destroy the audioinformation At the same time the communication theoryalso declares that the phasemodulation has strong robustnessto the noise signal

More specifically by DFT transform the phase distri-bution of a segment of the audio signal is easy to obtainThere is only one maximum point and one minimum pointin the phase spectrum as shown in Figure 2 and their relativepositions are fixed (if there are multiple extreme points thefirst one prevails) Asmentioned above the relative phase willmaintain consistency regardless of any transformation Thischaracteristic offers the possibility to random embedding ofthe IO

43 Random Embedding Strategy We will propose a strat-egy to determine the embedding position according to theposition of the extreme points in the phase spectrum Sincethe extreme points are random in the phase spectrumthe choice of the embedding position controlled by themis also random Therefore this strategy will guarantee themutual balance between the transparency and robustnessas previously described Furthermore the characteristic ofrelative phase invariance in frequency spectrum is uniquebecause only the audio segment is involved which guaranteesthe self- detectability as well

An 8times8 coefficient matrix D can be obtained from aDCT transformation of an audio signal s(n) as shown inFigure 3Themost robust embedding method recommendedis embedding at Position (00) while the most transparentone is at Position (77) Embedding at Position (00) canimprove robustness but adds more perceptual distortionwhile embedding at Position (77) can improve impercepti-bility but offers less robustness

To get a good compromise between these parametersthe mid-band coefficients such as (22) (L) and (44) (H) are

0 00003 00006 00009 00012 00015 00018 00021 00024 00027 00030time (s)

0 00003 00006 00009 00012 00015 00018 00021 00024 00027 00030time (s)

radi

an (r

ad)

maxima phase spectrum

minima

37500

30000

22500

15000

7500

0

minus7500

minus15000

minus22500

minus30000

minus37500

ampl

itude

(kH

z)

4

3

2

1

0

minus1

minus2

minus3

minus4

Figure 2The phase spectrum of a segment of the audio signal Theaudio sampling frequency is 44100kHz and the length is 05 secondsThe length of the intercepted fragment is 0003 seconds containing120 sampling points

L

H

0 1 2 3 4 5 6 7

0

1

2

3

4

5

6

7

Figure 3 An 8times8 coefficient matrix of DCT Position L representsthe low frequency region and position H represents the highfrequency region which cannot be perceived by human ear

6 Security and Communication Networks

(1) receiving the key vaults as kv (i)(2) set the number of kv (i) as n(3) divide the audio signal equally into n segments as s (j)(4) sampling the s (j) to c (j)(5) forallc(j) initialize j=1(6) while end do(7) if jlen then(8) calculate the phase spectrum 120593(120596) of s(j)(9) determine the position of the maxima and minima in 120593(120596)(10) calculate the coefficient matrix C (j) by 8 times 8 DCT of c(j)(11) if the maxima is in front of the minima then(12) embedding the kv (j) at C (22)(13) else(14) embedding the kv (j) at C (44)(15) end if(16) end if(17) end while

Algorithm 1 The algorithm of random embedding strategy based on relative phase

START

Have all the elements ofIO been embedded

Calculate the phase spectrum ofthe audio segment by DFT

Determine the relative positionof the maxima and minima

Is the maxima in frontof the minimum

Embed the elementat L

Embed the elementat H

Y N

N

Y

END

Figure 4 The flow chart of the random embedding strategy Thechoice of the embedding position between L and H is controlled bythe relative phase relation of the host audio segment

selected as the embedding position instead Position L or His the Boolean choice for random embedding The so-calledBoolean choice means that a choice must be made from thesetwo positions and the selection process is random Basedon the relative phase relation discussed above a randomembedding strategy is proposed as follows The flow chartof this strategy is shown in Figure 4 and its algorithm isillustrated in Algorithm 1

(i) If the maximum point of the audio segment appearsbefore the minimum point the corresponding eigen-value will be embedded into the low frequency region(Position L)

(ii) Otherwise the corresponding eigenvalue will beembedded into the high frequency region (PositionH)

After determining the embedding position the next stepis to construct a fuzzy vault containing the secret sequenceOn the finite field F the secret sequence key isin F By anencrypting set P (P isin F) the key can be packaged into thevault of P to generate the IO of the sender

After embedding the IO into the audio a new DCTcoefficient matrix 1198631015840 is generated By making a DCT inversetransformation on1198631015840 we can obtain a new audio signal 1199041015840(119899)containing IO as well Finally 1199041015840(119899) is the transformed audiosignal that we exactly expect

44 Detection and Extraction The detection and extractionof IO can be achieved by the difference between D and 1198631015840When the receiver receives IO it is essentially a copy Qof the fuzzy vault P containing the secret sequence On thefinite field F by matching the Q (QisinF) and P the key can beparsed from the vault to obtain the IO of the receiver Afterthe IO are recovered the integrity of the audio signal canbe determined according to the completeness and accuracyof them furtherly Figure 5 shows the reconstruction anddetection process

In Figure 5 (a) shows the reconstruction process of thehost audio and 1199041015840(119899) is the reconstructed one which containsthe IO (b) is the detection and extraction process of IO119890V is the eigenvalues and 119864V is its DCT transformation 1198901015840V isthe extracted eigenvalues and 1198641015840V is its DCT transformationBy matching the similarity between 1198901015840V and 119890V the similaritybetween the received and original audio can be evaluatedAccording to the features of the extracted IO it can be found

Security and Communication Networks 7

Sampling DCT

DCT

Inverse

Reconstruction

Detection

DCT

DCT DCT

Inverse

s(i) s(n)

s(n)

D

D

D

D

s(n)

(a)

(b)

s(n)

E

E e

e

Figure 5 The process of the reconstruction and detection of theeigenvalues (a) is the reconstruction process and (b) is the detectionprocess

whether the audio is cut or tampered More details can befound in the experimental section

45 Integrity Evaluation After the IO is extracted thesimilarity of 119890V and 1198901015840V can be calculated by their cosinedistance d and the integrity of the host audio 119868au can beevaluated based on this as shown in (11) It can be seen thatwhen 120575 takes 1 119868au is a decimal within (0 1) The greater thevalue of 119868au the higher the similarity also the better the audiointegrity

119868119886119906 = 120575 sdot 119889 = 120575 sdot sum119873minus1119894=0 sum119873minus1119894=0 119890 (119894) 1198901015840 (119894)radicsum119873minus1119894=0 1198902 (119894) sdot radicsum119873minus1119894=0 11989010158402 (119894) (11)

where d is the cosine distance of 119890V and 1198901015840V 120575 is a weightcoefficient and the default value is 1 119890(119894) is the i-th elementof 119890V and 1198901015840(119894) is the i-th element of 1198901015840V5 Experiments and Evaluations

51 Audio Quality Criterion The signal to noise ratio (SNR)is the widely approved and used audio quality criterion forthe evaluation of audio signal transformation as shown in thefollowing equation

SNR = 20 logradicsum119873minus1119899=0 1199042 (119899)radicsum119873minus1119899=0 [119904 (119899) minus 1199041015840 (119899)]2

(12)

Figure 6 The black-and-white image of 100times100 pixels as IO inthese experiments

where 1199041015840(119899) is the transformation of 119904(119899)SNR = 20 log

radicsum119873minus1119899=0 1199042 (119899)radicsum119873minus1119899=0 [119904 (119899) minus 1199041015840 (119899)]2

Eq(6)Eq(7)997888997888997888997888997888997888997888997888997888rarr

= 20 log[[[radicsum119873minus1119899=0 1199042 (119899)radicsum119873minus1119899=0 1205762 (119894 119899)

]]]= 20 logradic sum119873minus1119899=0 1199042 (119899)sum119873minus1119899=0 1205762 (119894 119899)= 20 logradic119873minus1sum

119899=0

119873minus1sum119899=0

1199042 (119899)1205762 (119894 119899) Eq(9)997888997888997888997888rarr

= 20 logradic119873minus1sum119899=0

11205882 (119894)

(13)

Therefore after embedding the IO on the i-th coefficientof s(n) the SNR can be calculated according to (6) (7) and(9) as shown in (13)

From (11) we can see that the SNR is only related to 120588(119894)that is to say the audio quality after embedded eigenvalues isonly related to the embedding position of its DCT coefficienti This proves that the embedding position is crucial to theimperceptibility and robustness of the host audio in ourRP-AIP model Therefore the embedding rule based on therelative phase of the audio signal itself is of extraordinarysignificance

52 Embedding and Extraction of IO In this experiment itis assumed that the IO is a black-and-white image as shownin Figure 6 and the host audio is a set of audio clips ofthe left channel as shown in Figure 7 and Table 1 Firstlywe uniformly embedded the IO in the host audio and thenrecovered it in the opposite way

According to (12) the SNR of different audio types iscalculated as shown in Table 1 which is in line with therequirements of robustness and imperceptibility

53 Integrity Protection Performance In the transmissionprocess the audio may suffer from network congestionaccidental packet loss and other accidents caused by network

8 Security and Communication Networks

Table 1 The SNR of different audio types after embedding IO

Audio clipsA1 A2 A3

Audio types A segment of voice message A prompt tone of Windows 10 A piece of musicSNR 3096 4233 4640

norm

aliz

ed am

plitu

de

0 02 04 06 08 10

0 02 04 06 08 10

time (s)

0 02 04 06 08 10

1

1

1

05

05

05

minus05

minus05

minus05

minus1

minus1

minus1

0

0

0

A3)(

A2)(

A1)(

Figure 7 The host audio clips in the experiment The audiosampling frequency is 22050 kHz and the length is 10 seconds (A1)a segment of voice message (A2) a prompt tone of Windows 10(A3) a piece of music

anomalies even filtering compression and AD conversionTherefore the first thing to consider is whether the model caneffectively distinguish between such accidents and maliciousattacks such as cutting and tampering As is stated above theloss and damage of IO caused by the transmission anomaliesare generally accidental and random but deterministic anddirectional caused by malicious attacks Thus this issue canbe judged from the recovered IO

531 Audio Processing In this experiment we simulated fivecommon audio processing methods smoothinglow-pass fil-tering (with 4kHz cutoff frequency) band-pass filtering (with200-2kHz cutoff frequency) MPEG-1 (with compressionratio of 1051 and 121) compression and AD conversionThe evaluation of audio signal distortion is indicated by thesimilarity between the original and recovered IO as well

Figure 8 shows the simulation results Among them (a)is the recovered IO after a smoothinglow-pass filtering (b)is the one after a band-pass filtering (c) is the one after aMPEG-1 (with compression ratio of 1051) compression (d)

(a) (b) (c)

(d) (e)

Figure 8 The simulation results of the recovered IO of differentaudio processing methods

is the one after a MPEG-1 (with compression ratio of 121)compression and (e) is the one after an AD conversion Thesimilarity between the original and recovered IO and thecomparison with the other two common audio embeddingmethods (embedding the IO on 0times0 or 119899 times 119899 coefficientwhich is called 0-coefficient or n-coefficient embedding forshort) can be found in Table 2 and presented by Figure 9

It can be seen from Figure 8 and Table 2 that the RP-AIPmodel can well adapt to the common audio processing thatis to say the general audio processing method will not lead tothe destruction and loss of IO It can be seen from Figure 9that the RP-AIP model has similar robustness against thecommon audio processing compared with the 0-coefficientembedding model In addition as far as the smoothinglow-pass and band-pass filtering methods are concerned theperformance of band-pass filtering of RP-AIP is superior toother models significantly thanks to the method of randomembedding in the mid-band coefficient In general RP-AIPexhibits sufficient robustness for general audio processingmethods

532 Malicious Attack In this experiment we simulatedthree kinds of malicious attacks as cutting tampering andresampling Moreover in the type of tampering it canbe divided into self-media tampering and non-self-mediatampering self-media tampering refers to using the mediaitself for shifting swapping and so forth while non-self-media tampering refers to using other media for replacementcoverage and so forth Resampling can also be dividedinto upsampling and downsampling upsampling refers to

Security and Communication Networks 9

Table 2 The similarity 120588 between the original and recovered IO in audio processing experiments

No Processing methodsSimilarity120588

Random embedding (RP-AIP) 0-coefficient embedding n-coefficient embedding(a) Original 100 100 100(b) Without processing 097 099 095(c) Smoothinglow-pass filtering 088 098 018(d) Band-pass filtering 090 085 084(e) MPEG-1 (with 1051 ratio) 079 088 068(f) MPEG-1 (with 121 ratio) 081 085 064(g) AD conversion 075 074 076

Processing methods

0

05

1

15

2

25

3

RP-AIP0-coefficientn-coefficient

095

099

097

018

098

088 090

085

084068

088

079 081

085

064 076

074

075

W LP BP MP(i) MP(ii) AD

Figure 9 The comparison of the simulation results of differentaudio processing and embedding methods W without processingLP smoothinglow-pass filtering BP band-pass filtering MP (i)MPEG-1 compression (with compression ratio of 1051) MP (ii)MPEG-1 compression (with compression ratio of 121) AD ADconversion

interpolation expansion of the original audio while down-sampling refers to the extraction of the original audio Cuttingand tampering are undoubtedly malicious attacks on audioand attempt to change the speakerrsquos intention Howeverresampling attacks often only affect the length or quality ofthe audio especially the upsampling but cannot change thespeakerrsquos intention

Figure 10 shows the simulation results Among them (f)is the recovered IO after a cutting attack (g) is the one aftera self-media tampering attack (h) is the one after a non-self-media tampering attack (i) is the one after an upsamplingattack and (j) is the one after a downsampling attack Thesimilarity between the original and recovered IO and thecomparison with the other two common audio embeddingmethods as mentioned above can be found in Table 3 andpresented by Figure 11

As can be seen from Figure 10 and Table 3 in additionto the upsampling attack the RP-AIP model can effectively

(f)(g)(f) (h)

(i) (j)

Figure 10 The simulation results of the recovered IO of differentmalicious attacks

Attack types

0

05

1

15

2

25

3

097

099

095

050

044

031

062

046

024

057

033019

092

098

079022

042

020

RP-AIP0-coefficientn-coefficient

W CT TP(i) TP(ii) RS(i) RS(ii)

Figure 11 The comparison of the simulation results of differentmalicious attacks and embedding methods W without processingCT cutting attack TP (i) self-media tampering attack TP (ii)non-self-media tampering attack RS (i) upsampling attack RS (ii)downsampling attack

10 Security and Communication Networks

Table 3 The similarity 120588 between the original and recovered IO in malicious attack experiments

No Attack typesSimilarity120588

Random embedding (RP-AIP) 0-coefficient embedding n-coefficient embedding(a) Original 100 100 100(b) Without attacks 097 099 095(h) Cutting 031 044 050(i) Tampering (self-media) 024 046 062(j) Tampering (non-self-media) 019 033 057(k) Resampling (upsampling) 097 098 092(l) Resampling (downsampling) 020 042 022

detect and reflect malicious attacks As stated earlier the IOis the tangible representation form of the audio integritythat is changes in IO directly reflect changes in audiomedia Therefore we can evaluate the type and degree ofthe attack on the audio by observing IO In the case of (f)since cutting will directly lead to the loss of IO when theIO is reconstructed by 100times100 pixels it will become anincomplete image The missing part of IO indicates that thecorresponding audio part has been cut off In the same waythe tampered part appears as random noise or disorderedas shown in (g) and (h) because the audio has suffered thecorresponding tampering attack However for resamplingattacks the RP-AIP shows different results as shown in (i)and (j) This is because the upsampling tends to interpolatethe original audio signal without causing loss of IO whiledownsampling is the extraction of the original audio signalwhich directly leads to the loss and destruction of IO Asstated in the Introduction we are more concerned aboutwhether the attack has tampered with the speakerrsquos intentionbut resampling is generally not

It can be seen from Figure 11 that the RP-AIP modelhas definite better performance than the other two modelsThis is because attacks such as cutting and tampering aregenerally oriented to the audio signalrsquos time domain ratherthan the frequency domain so the damage to the low andhigh frequency parts is random The embedding methodwith only consideration of low or high frequency can onlyguarantee one aspect while the random embedding methodin RP-AIP model can synthesize these two frequency partsThis is also the innovation and highlight of our work

6 Conclusion

Audio integrity protection for network interaction envi-ronment is not only an effective way to protect audioinformation but also an important link in building trustedcommunication Due to real-time constraints of interactionscenarios this kind of audio protection needs to reconsidersome new characteristics such as streaming characteristicimperceptibility self-detectability and distortion tolerancewhich are not covered by the traditional programsThis posesnew challenges to the existing audio protection solutions

To this end a method of audio integrity protection basedon relative phase (RP-AIP) is proposed in this work In

the RP-AIP model we established the concept of integrityobject (IO) in order to propose and abstract the integrityof the audio signal and then transform it into a tangiblerepresentation form In addition we also fully consideredthe characteristics of the audio signal in the DFT and DCTtransform domain and used the frequency characteristicsof the audio itself to guide the random embedding of IOthereby achieving blind detection and extraction of it Thetangible expression of the audio integrity and the relativephase based random embedding scheme are the highlight ofour work

The simulation experiments show that the RP-AIPmodelcan guarantee the SNRof the audio signal and the embeddingof IO will not cause signal distortion In addition it hasgood adaptability to some common audio processing suchas smoothinglow-pass filtering band-pass filtering MPEG-1 compression with different ratios and AD conversionHowever aiming at malicious attacks such as cutting tam-pering and resampling the RP-AIP model shows obvioussuperiority to other similar protection solutions Of coursethe research about audio integrity protection is still in itsinfancy and more work will be done to make it morecomplete

Data Availability

Some of the data is uploaded to the public server httpsdoiorg106084m9figshare6382709v1 In the file you canfind the demo audio and its spectrum data the IO data andthe audio data used in the experiment

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work is supported by the National Natural ScienceFoundation of China [No 61671141 No 61401081] the Liaon-ing Provincial Natural Science Foundation of China [No20180551007] and the Ministry of Education-China MobileScientific Research Funds [No MCM20150103]

Security and Communication Networks 11

References

[1] R F Olanrewaju and O Khalifa ldquoDigital audio watermark-ing techniques and applicationsrdquo in Proceedings of the 2012International Conference on Computer and CommunicationEngineering ICCCE 2012 pp 830ndash835 Malaysia July 2012

[2] J Seok J Hong and J Kim ldquoA novel audio watermarkingalgorithm for copyright protection of digital audiordquo ETRIJournal vol 24 no 3 pp 181ndash189 2002

[3] H Yassine B Bachir and K Aziz ldquoA Secure and HighRobust AudioWatermarking System for Copyright ProtectionrdquoInternational Journal of Computer Applications vol 53 no 17pp 33ndash39 2012

[4] J Haitsma M van der Veen T Kalker and F BruekersldquoAudio watermarking for monitoring and copy protectionrdquo inProceedings of the ACM Workshops on Multimedia ACM pp119ndash122 2000

[5] R D Shelke and M U Nemade ldquoAudio watermarking tech-niques for copyright protection A reviewrdquo in Proceedings ofthe 2016 International Conference on Global Trends in SignalProcessing Information Computing and Communication ICGT-SPICC 2016 pp 634ndash640 December 2016

[6] S V Dhavale R S Deodhar and L M Patnaik ldquoHighCapacity Lossless Semi-fragile Audio Watermarking in theTimeDomainrdquo inAhierarchical CPNmodel formobility analysisin zone based MANET pp 843ndash852 2012

[7] S V Dhavale R S Deodhar D Pradhan and L M PatnaikldquoState Transition Based Embedding in Cepstrum Domain forAudio Copyright Protectionrdquo IETE Journal of Research vol 61no 1 pp 41ndash55 2015

[8] HH Tsai J S Cheng and P T Yu ldquoAudioWatermarkingBasedonHAS andNeuralNetworks inDCTDomainrdquoEurasip Journalon Advances in Signal Processing vol 3 no 2 pp 1ndash12 2003

[9] Y Xiang I Natgunanathan Y Rong and S Guo ldquoSpreadspectrum-based high embedding capacity watermarkingmethod for audio signalsrdquo IEEE Transactions on Audio Speechand Language Processing vol 23 no 12 pp 2228ndash2237 2015

[10] X Huang A Nishimura and I Echizen ldquoA Reversible AcousticSteganography for Integrity Verificationrdquo inDigital Watermark-ing p 305 Springer Berlin Heidelberg Germany 2011

[11] H Cao Y Wang J Li et al ldquoAudio data hiding using perceptualmasking effect of HASrdquo in Proceedings of the InternationalSymposium on Multispectral Image Processing and PatternRecognition International Society for Optics and Photonics 2007

[12] T Painter and A Spanias ldquoPerceptual coding of digital audiordquoProceedings of the IEEE vol 88 no 4 pp 451ndash512 2000

[13] S V Dhavale R S Deodhar D Pradhan et al ldquoRobustMultiple Stereo Audio Watermarking for Copyright Protectionand Integrity Checkingrdquo in Proceedings of the InternationalConference on Computational Intelligence and Information Tech-nology IET pp 9ndash16 2014

[14] L Boney A H Tewfik and K N Hamdy ldquoDigital watermarksfor audio signalsrdquo in Proceedings of the IEEE InternationalConference onMultimedia Computing and Systems pp 473ndash480IEEE 2002

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 4: A Relative Phase Based Audio Integrity Protection Method: Model …downloads.hindawi.com/journals/scn/2018/7139391.pdf · 2019-07-30 · ResearchArticle A Relative Phase Based Audio

4 Security and Communication Networks

Key Fuzzy vault

D IO generation

DFT transformation

Relative phase relation

Divide into segments

Coefficient matrix

DCT transformation

C DCTB DFT

A Signal processing

Eigenvalues

Integrity object (IO)

Sampling

Figure 1 The overall structure of the RP-AIP model The eigenvalues generated by the key and fuzzy vault are embedded into the DCTcoefficient matrix controlled by the relative phase relation indicated from DFT phase spectrum

Since the segments are evenly divided the IO can beequally distributed in the audio data and the size of the frag-ments can be adjusted by actual requirements In this way theintegrity of the audio can be abstracted as the completenessand accuracy of the IO by uniform embedding

In addition it is necessary to explain the function of PartD Considering the issues of network congestion accidentalpacket loss and so forth caused by network anomaliesthe received IO may not be exactly the same with theoriginal However the loss and damage of IO caused bythe transmission anomalies are generally accidental andrandom but when it comes to malicious attack it is generallydeterministic and directional Moreover as stated earlier thefundamental goal of audio integrity protection is to ensure thecorrectness and credibility of the audio information ratherthan the audio signal medium itself To this end the use offuzzy vault can solve this embarrassment because the fuzzyvault can guarantee certain similarity rather than completecertainty

33 Embedding Analysis Through the introduction and anal-ysis of audio integrity protection characteristics we know thataudio embedding should be transparent to the user On the

other hand the distortion tolerance characteristic requiresthat the embedding process should have some robustness[13] Interestingly these two requirements are contradictoryto ensure transparency it often means that the operation isusually based on the unimportant components of the audiosignal which are not sensitive to human ear such as thehigh frequency regions However robustness is generallydependent on the important components of the audio signalwhich are often sensitive to human ear such as the lowfrequency regions [14]

After the audio signal sampled it is transformed into anonperiodic discrete signal in the time domain and thena set of coefficient matrices composed of DCT coefficientscan be obtained by the DCT transformation Here the lowfrequency signal energy is gathered in the upper left cornerof the matrix and the high frequency energy distributed inthe lower right corner region is almost zero Therefore theeigenvalues can be embedded in these two regions the lowfrequency region is robust because it carries a large amountof the main signal energy while the high frequency regionhas good invisibility because of the absence of perceptionHere comes the question what kind of embedding strategyis the fairest To this end we propose to use the relative

Security and Communication Networks 5

phase relation characteristic of the audio signalrsquos frequencyspectrum to control the embedding position

4 Process and Strategy of RP-AIP

41 Audio Segmentation According to the scheme designconcepts we need to embed the eigenvalues evenly in theaudio signal so this requires that the audio signal data shouldbe divided evenly in advance

The number of segments can be determined based on theelements of IO and the size of the audio In general assumingthat there are n elements in the IO then we will divide theaudio signal data into n segments as well Thus it can beensured that the whole IO can be completely embedded andbound with the audio

42 Relative Phase Invariance Through the early part welearnt that the human ear is very sensitive to relative phasechanges of audio signal but lacks the ability to perceiveand distinguish the absolute phase Therefore for a certainaudio signal the phase component is more important thanthe amplitude component and any change or damage tothe phase component will cause unacceptable distortionof the audio quality or even completely destroy the audioinformation At the same time the communication theoryalso declares that the phasemodulation has strong robustnessto the noise signal

More specifically by DFT transform the phase distri-bution of a segment of the audio signal is easy to obtainThere is only one maximum point and one minimum pointin the phase spectrum as shown in Figure 2 and their relativepositions are fixed (if there are multiple extreme points thefirst one prevails) Asmentioned above the relative phase willmaintain consistency regardless of any transformation Thischaracteristic offers the possibility to random embedding ofthe IO

43 Random Embedding Strategy We will propose a strat-egy to determine the embedding position according to theposition of the extreme points in the phase spectrum Sincethe extreme points are random in the phase spectrumthe choice of the embedding position controlled by themis also random Therefore this strategy will guarantee themutual balance between the transparency and robustnessas previously described Furthermore the characteristic ofrelative phase invariance in frequency spectrum is uniquebecause only the audio segment is involved which guaranteesthe self- detectability as well

An 8times8 coefficient matrix D can be obtained from aDCT transformation of an audio signal s(n) as shown inFigure 3Themost robust embedding method recommendedis embedding at Position (00) while the most transparentone is at Position (77) Embedding at Position (00) canimprove robustness but adds more perceptual distortionwhile embedding at Position (77) can improve impercepti-bility but offers less robustness

To get a good compromise between these parametersthe mid-band coefficients such as (22) (L) and (44) (H) are

0 00003 00006 00009 00012 00015 00018 00021 00024 00027 00030time (s)

0 00003 00006 00009 00012 00015 00018 00021 00024 00027 00030time (s)

radi

an (r

ad)

maxima phase spectrum

minima

37500

30000

22500

15000

7500

0

minus7500

minus15000

minus22500

minus30000

minus37500

ampl

itude

(kH

z)

4

3

2

1

0

minus1

minus2

minus3

minus4

Figure 2The phase spectrum of a segment of the audio signal Theaudio sampling frequency is 44100kHz and the length is 05 secondsThe length of the intercepted fragment is 0003 seconds containing120 sampling points

L

H

0 1 2 3 4 5 6 7

0

1

2

3

4

5

6

7

Figure 3 An 8times8 coefficient matrix of DCT Position L representsthe low frequency region and position H represents the highfrequency region which cannot be perceived by human ear

6 Security and Communication Networks

(1) receiving the key vaults as kv (i)(2) set the number of kv (i) as n(3) divide the audio signal equally into n segments as s (j)(4) sampling the s (j) to c (j)(5) forallc(j) initialize j=1(6) while end do(7) if jlen then(8) calculate the phase spectrum 120593(120596) of s(j)(9) determine the position of the maxima and minima in 120593(120596)(10) calculate the coefficient matrix C (j) by 8 times 8 DCT of c(j)(11) if the maxima is in front of the minima then(12) embedding the kv (j) at C (22)(13) else(14) embedding the kv (j) at C (44)(15) end if(16) end if(17) end while

Algorithm 1 The algorithm of random embedding strategy based on relative phase

START

Have all the elements ofIO been embedded

Calculate the phase spectrum ofthe audio segment by DFT

Determine the relative positionof the maxima and minima

Is the maxima in frontof the minimum

Embed the elementat L

Embed the elementat H

Y N

N

Y

END

Figure 4 The flow chart of the random embedding strategy Thechoice of the embedding position between L and H is controlled bythe relative phase relation of the host audio segment

selected as the embedding position instead Position L or His the Boolean choice for random embedding The so-calledBoolean choice means that a choice must be made from thesetwo positions and the selection process is random Basedon the relative phase relation discussed above a randomembedding strategy is proposed as follows The flow chartof this strategy is shown in Figure 4 and its algorithm isillustrated in Algorithm 1

(i) If the maximum point of the audio segment appearsbefore the minimum point the corresponding eigen-value will be embedded into the low frequency region(Position L)

(ii) Otherwise the corresponding eigenvalue will beembedded into the high frequency region (PositionH)

After determining the embedding position the next stepis to construct a fuzzy vault containing the secret sequenceOn the finite field F the secret sequence key isin F By anencrypting set P (P isin F) the key can be packaged into thevault of P to generate the IO of the sender

After embedding the IO into the audio a new DCTcoefficient matrix 1198631015840 is generated By making a DCT inversetransformation on1198631015840 we can obtain a new audio signal 1199041015840(119899)containing IO as well Finally 1199041015840(119899) is the transformed audiosignal that we exactly expect

44 Detection and Extraction The detection and extractionof IO can be achieved by the difference between D and 1198631015840When the receiver receives IO it is essentially a copy Qof the fuzzy vault P containing the secret sequence On thefinite field F by matching the Q (QisinF) and P the key can beparsed from the vault to obtain the IO of the receiver Afterthe IO are recovered the integrity of the audio signal canbe determined according to the completeness and accuracyof them furtherly Figure 5 shows the reconstruction anddetection process

In Figure 5 (a) shows the reconstruction process of thehost audio and 1199041015840(119899) is the reconstructed one which containsthe IO (b) is the detection and extraction process of IO119890V is the eigenvalues and 119864V is its DCT transformation 1198901015840V isthe extracted eigenvalues and 1198641015840V is its DCT transformationBy matching the similarity between 1198901015840V and 119890V the similaritybetween the received and original audio can be evaluatedAccording to the features of the extracted IO it can be found

Security and Communication Networks 7

Sampling DCT

DCT

Inverse

Reconstruction

Detection

DCT

DCT DCT

Inverse

s(i) s(n)

s(n)

D

D

D

D

s(n)

(a)

(b)

s(n)

E

E e

e

Figure 5 The process of the reconstruction and detection of theeigenvalues (a) is the reconstruction process and (b) is the detectionprocess

whether the audio is cut or tampered More details can befound in the experimental section

45 Integrity Evaluation After the IO is extracted thesimilarity of 119890V and 1198901015840V can be calculated by their cosinedistance d and the integrity of the host audio 119868au can beevaluated based on this as shown in (11) It can be seen thatwhen 120575 takes 1 119868au is a decimal within (0 1) The greater thevalue of 119868au the higher the similarity also the better the audiointegrity

119868119886119906 = 120575 sdot 119889 = 120575 sdot sum119873minus1119894=0 sum119873minus1119894=0 119890 (119894) 1198901015840 (119894)radicsum119873minus1119894=0 1198902 (119894) sdot radicsum119873minus1119894=0 11989010158402 (119894) (11)

where d is the cosine distance of 119890V and 1198901015840V 120575 is a weightcoefficient and the default value is 1 119890(119894) is the i-th elementof 119890V and 1198901015840(119894) is the i-th element of 1198901015840V5 Experiments and Evaluations

51 Audio Quality Criterion The signal to noise ratio (SNR)is the widely approved and used audio quality criterion forthe evaluation of audio signal transformation as shown in thefollowing equation

SNR = 20 logradicsum119873minus1119899=0 1199042 (119899)radicsum119873minus1119899=0 [119904 (119899) minus 1199041015840 (119899)]2

(12)

Figure 6 The black-and-white image of 100times100 pixels as IO inthese experiments

where 1199041015840(119899) is the transformation of 119904(119899)SNR = 20 log

radicsum119873minus1119899=0 1199042 (119899)radicsum119873minus1119899=0 [119904 (119899) minus 1199041015840 (119899)]2

Eq(6)Eq(7)997888997888997888997888997888997888997888997888997888rarr

= 20 log[[[radicsum119873minus1119899=0 1199042 (119899)radicsum119873minus1119899=0 1205762 (119894 119899)

]]]= 20 logradic sum119873minus1119899=0 1199042 (119899)sum119873minus1119899=0 1205762 (119894 119899)= 20 logradic119873minus1sum

119899=0

119873minus1sum119899=0

1199042 (119899)1205762 (119894 119899) Eq(9)997888997888997888997888rarr

= 20 logradic119873minus1sum119899=0

11205882 (119894)

(13)

Therefore after embedding the IO on the i-th coefficientof s(n) the SNR can be calculated according to (6) (7) and(9) as shown in (13)

From (11) we can see that the SNR is only related to 120588(119894)that is to say the audio quality after embedded eigenvalues isonly related to the embedding position of its DCT coefficienti This proves that the embedding position is crucial to theimperceptibility and robustness of the host audio in ourRP-AIP model Therefore the embedding rule based on therelative phase of the audio signal itself is of extraordinarysignificance

52 Embedding and Extraction of IO In this experiment itis assumed that the IO is a black-and-white image as shownin Figure 6 and the host audio is a set of audio clips ofthe left channel as shown in Figure 7 and Table 1 Firstlywe uniformly embedded the IO in the host audio and thenrecovered it in the opposite way

According to (12) the SNR of different audio types iscalculated as shown in Table 1 which is in line with therequirements of robustness and imperceptibility

53 Integrity Protection Performance In the transmissionprocess the audio may suffer from network congestionaccidental packet loss and other accidents caused by network

8 Security and Communication Networks

Table 1 The SNR of different audio types after embedding IO

Audio clipsA1 A2 A3

Audio types A segment of voice message A prompt tone of Windows 10 A piece of musicSNR 3096 4233 4640

norm

aliz

ed am

plitu

de

0 02 04 06 08 10

0 02 04 06 08 10

time (s)

0 02 04 06 08 10

1

1

1

05

05

05

minus05

minus05

minus05

minus1

minus1

minus1

0

0

0

A3)(

A2)(

A1)(

Figure 7 The host audio clips in the experiment The audiosampling frequency is 22050 kHz and the length is 10 seconds (A1)a segment of voice message (A2) a prompt tone of Windows 10(A3) a piece of music

anomalies even filtering compression and AD conversionTherefore the first thing to consider is whether the model caneffectively distinguish between such accidents and maliciousattacks such as cutting and tampering As is stated above theloss and damage of IO caused by the transmission anomaliesare generally accidental and random but deterministic anddirectional caused by malicious attacks Thus this issue canbe judged from the recovered IO

531 Audio Processing In this experiment we simulated fivecommon audio processing methods smoothinglow-pass fil-tering (with 4kHz cutoff frequency) band-pass filtering (with200-2kHz cutoff frequency) MPEG-1 (with compressionratio of 1051 and 121) compression and AD conversionThe evaluation of audio signal distortion is indicated by thesimilarity between the original and recovered IO as well

Figure 8 shows the simulation results Among them (a)is the recovered IO after a smoothinglow-pass filtering (b)is the one after a band-pass filtering (c) is the one after aMPEG-1 (with compression ratio of 1051) compression (d)

(a) (b) (c)

(d) (e)

Figure 8 The simulation results of the recovered IO of differentaudio processing methods

is the one after a MPEG-1 (with compression ratio of 121)compression and (e) is the one after an AD conversion Thesimilarity between the original and recovered IO and thecomparison with the other two common audio embeddingmethods (embedding the IO on 0times0 or 119899 times 119899 coefficientwhich is called 0-coefficient or n-coefficient embedding forshort) can be found in Table 2 and presented by Figure 9

It can be seen from Figure 8 and Table 2 that the RP-AIPmodel can well adapt to the common audio processing thatis to say the general audio processing method will not lead tothe destruction and loss of IO It can be seen from Figure 9that the RP-AIP model has similar robustness against thecommon audio processing compared with the 0-coefficientembedding model In addition as far as the smoothinglow-pass and band-pass filtering methods are concerned theperformance of band-pass filtering of RP-AIP is superior toother models significantly thanks to the method of randomembedding in the mid-band coefficient In general RP-AIPexhibits sufficient robustness for general audio processingmethods

532 Malicious Attack In this experiment we simulatedthree kinds of malicious attacks as cutting tampering andresampling Moreover in the type of tampering it canbe divided into self-media tampering and non-self-mediatampering self-media tampering refers to using the mediaitself for shifting swapping and so forth while non-self-media tampering refers to using other media for replacementcoverage and so forth Resampling can also be dividedinto upsampling and downsampling upsampling refers to

Security and Communication Networks 9

Table 2 The similarity 120588 between the original and recovered IO in audio processing experiments

No Processing methodsSimilarity120588

Random embedding (RP-AIP) 0-coefficient embedding n-coefficient embedding(a) Original 100 100 100(b) Without processing 097 099 095(c) Smoothinglow-pass filtering 088 098 018(d) Band-pass filtering 090 085 084(e) MPEG-1 (with 1051 ratio) 079 088 068(f) MPEG-1 (with 121 ratio) 081 085 064(g) AD conversion 075 074 076

Processing methods

0

05

1

15

2

25

3

RP-AIP0-coefficientn-coefficient

095

099

097

018

098

088 090

085

084068

088

079 081

085

064 076

074

075

W LP BP MP(i) MP(ii) AD

Figure 9 The comparison of the simulation results of differentaudio processing and embedding methods W without processingLP smoothinglow-pass filtering BP band-pass filtering MP (i)MPEG-1 compression (with compression ratio of 1051) MP (ii)MPEG-1 compression (with compression ratio of 121) AD ADconversion

interpolation expansion of the original audio while down-sampling refers to the extraction of the original audio Cuttingand tampering are undoubtedly malicious attacks on audioand attempt to change the speakerrsquos intention Howeverresampling attacks often only affect the length or quality ofthe audio especially the upsampling but cannot change thespeakerrsquos intention

Figure 10 shows the simulation results Among them (f)is the recovered IO after a cutting attack (g) is the one aftera self-media tampering attack (h) is the one after a non-self-media tampering attack (i) is the one after an upsamplingattack and (j) is the one after a downsampling attack Thesimilarity between the original and recovered IO and thecomparison with the other two common audio embeddingmethods as mentioned above can be found in Table 3 andpresented by Figure 11

As can be seen from Figure 10 and Table 3 in additionto the upsampling attack the RP-AIP model can effectively

(f)(g)(f) (h)

(i) (j)

Figure 10 The simulation results of the recovered IO of differentmalicious attacks

Attack types

0

05

1

15

2

25

3

097

099

095

050

044

031

062

046

024

057

033019

092

098

079022

042

020

RP-AIP0-coefficientn-coefficient

W CT TP(i) TP(ii) RS(i) RS(ii)

Figure 11 The comparison of the simulation results of differentmalicious attacks and embedding methods W without processingCT cutting attack TP (i) self-media tampering attack TP (ii)non-self-media tampering attack RS (i) upsampling attack RS (ii)downsampling attack

10 Security and Communication Networks

Table 3 The similarity 120588 between the original and recovered IO in malicious attack experiments

No Attack typesSimilarity120588

Random embedding (RP-AIP) 0-coefficient embedding n-coefficient embedding(a) Original 100 100 100(b) Without attacks 097 099 095(h) Cutting 031 044 050(i) Tampering (self-media) 024 046 062(j) Tampering (non-self-media) 019 033 057(k) Resampling (upsampling) 097 098 092(l) Resampling (downsampling) 020 042 022

detect and reflect malicious attacks As stated earlier the IOis the tangible representation form of the audio integritythat is changes in IO directly reflect changes in audiomedia Therefore we can evaluate the type and degree ofthe attack on the audio by observing IO In the case of (f)since cutting will directly lead to the loss of IO when theIO is reconstructed by 100times100 pixels it will become anincomplete image The missing part of IO indicates that thecorresponding audio part has been cut off In the same waythe tampered part appears as random noise or disorderedas shown in (g) and (h) because the audio has suffered thecorresponding tampering attack However for resamplingattacks the RP-AIP shows different results as shown in (i)and (j) This is because the upsampling tends to interpolatethe original audio signal without causing loss of IO whiledownsampling is the extraction of the original audio signalwhich directly leads to the loss and destruction of IO Asstated in the Introduction we are more concerned aboutwhether the attack has tampered with the speakerrsquos intentionbut resampling is generally not

It can be seen from Figure 11 that the RP-AIP modelhas definite better performance than the other two modelsThis is because attacks such as cutting and tampering aregenerally oriented to the audio signalrsquos time domain ratherthan the frequency domain so the damage to the low andhigh frequency parts is random The embedding methodwith only consideration of low or high frequency can onlyguarantee one aspect while the random embedding methodin RP-AIP model can synthesize these two frequency partsThis is also the innovation and highlight of our work

6 Conclusion

Audio integrity protection for network interaction envi-ronment is not only an effective way to protect audioinformation but also an important link in building trustedcommunication Due to real-time constraints of interactionscenarios this kind of audio protection needs to reconsidersome new characteristics such as streaming characteristicimperceptibility self-detectability and distortion tolerancewhich are not covered by the traditional programsThis posesnew challenges to the existing audio protection solutions

To this end a method of audio integrity protection basedon relative phase (RP-AIP) is proposed in this work In

the RP-AIP model we established the concept of integrityobject (IO) in order to propose and abstract the integrityof the audio signal and then transform it into a tangiblerepresentation form In addition we also fully consideredthe characteristics of the audio signal in the DFT and DCTtransform domain and used the frequency characteristicsof the audio itself to guide the random embedding of IOthereby achieving blind detection and extraction of it Thetangible expression of the audio integrity and the relativephase based random embedding scheme are the highlight ofour work

The simulation experiments show that the RP-AIPmodelcan guarantee the SNRof the audio signal and the embeddingof IO will not cause signal distortion In addition it hasgood adaptability to some common audio processing suchas smoothinglow-pass filtering band-pass filtering MPEG-1 compression with different ratios and AD conversionHowever aiming at malicious attacks such as cutting tam-pering and resampling the RP-AIP model shows obvioussuperiority to other similar protection solutions Of coursethe research about audio integrity protection is still in itsinfancy and more work will be done to make it morecomplete

Data Availability

Some of the data is uploaded to the public server httpsdoiorg106084m9figshare6382709v1 In the file you canfind the demo audio and its spectrum data the IO data andthe audio data used in the experiment

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work is supported by the National Natural ScienceFoundation of China [No 61671141 No 61401081] the Liaon-ing Provincial Natural Science Foundation of China [No20180551007] and the Ministry of Education-China MobileScientific Research Funds [No MCM20150103]

Security and Communication Networks 11

References

[1] R F Olanrewaju and O Khalifa ldquoDigital audio watermark-ing techniques and applicationsrdquo in Proceedings of the 2012International Conference on Computer and CommunicationEngineering ICCCE 2012 pp 830ndash835 Malaysia July 2012

[2] J Seok J Hong and J Kim ldquoA novel audio watermarkingalgorithm for copyright protection of digital audiordquo ETRIJournal vol 24 no 3 pp 181ndash189 2002

[3] H Yassine B Bachir and K Aziz ldquoA Secure and HighRobust AudioWatermarking System for Copyright ProtectionrdquoInternational Journal of Computer Applications vol 53 no 17pp 33ndash39 2012

[4] J Haitsma M van der Veen T Kalker and F BruekersldquoAudio watermarking for monitoring and copy protectionrdquo inProceedings of the ACM Workshops on Multimedia ACM pp119ndash122 2000

[5] R D Shelke and M U Nemade ldquoAudio watermarking tech-niques for copyright protection A reviewrdquo in Proceedings ofthe 2016 International Conference on Global Trends in SignalProcessing Information Computing and Communication ICGT-SPICC 2016 pp 634ndash640 December 2016

[6] S V Dhavale R S Deodhar and L M Patnaik ldquoHighCapacity Lossless Semi-fragile Audio Watermarking in theTimeDomainrdquo inAhierarchical CPNmodel formobility analysisin zone based MANET pp 843ndash852 2012

[7] S V Dhavale R S Deodhar D Pradhan and L M PatnaikldquoState Transition Based Embedding in Cepstrum Domain forAudio Copyright Protectionrdquo IETE Journal of Research vol 61no 1 pp 41ndash55 2015

[8] HH Tsai J S Cheng and P T Yu ldquoAudioWatermarkingBasedonHAS andNeuralNetworks inDCTDomainrdquoEurasip Journalon Advances in Signal Processing vol 3 no 2 pp 1ndash12 2003

[9] Y Xiang I Natgunanathan Y Rong and S Guo ldquoSpreadspectrum-based high embedding capacity watermarkingmethod for audio signalsrdquo IEEE Transactions on Audio Speechand Language Processing vol 23 no 12 pp 2228ndash2237 2015

[10] X Huang A Nishimura and I Echizen ldquoA Reversible AcousticSteganography for Integrity Verificationrdquo inDigital Watermark-ing p 305 Springer Berlin Heidelberg Germany 2011

[11] H Cao Y Wang J Li et al ldquoAudio data hiding using perceptualmasking effect of HASrdquo in Proceedings of the InternationalSymposium on Multispectral Image Processing and PatternRecognition International Society for Optics and Photonics 2007

[12] T Painter and A Spanias ldquoPerceptual coding of digital audiordquoProceedings of the IEEE vol 88 no 4 pp 451ndash512 2000

[13] S V Dhavale R S Deodhar D Pradhan et al ldquoRobustMultiple Stereo Audio Watermarking for Copyright Protectionand Integrity Checkingrdquo in Proceedings of the InternationalConference on Computational Intelligence and Information Tech-nology IET pp 9ndash16 2014

[14] L Boney A H Tewfik and K N Hamdy ldquoDigital watermarksfor audio signalsrdquo in Proceedings of the IEEE InternationalConference onMultimedia Computing and Systems pp 473ndash480IEEE 2002

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 5: A Relative Phase Based Audio Integrity Protection Method: Model …downloads.hindawi.com/journals/scn/2018/7139391.pdf · 2019-07-30 · ResearchArticle A Relative Phase Based Audio

Security and Communication Networks 5

phase relation characteristic of the audio signalrsquos frequencyspectrum to control the embedding position

4 Process and Strategy of RP-AIP

41 Audio Segmentation According to the scheme designconcepts we need to embed the eigenvalues evenly in theaudio signal so this requires that the audio signal data shouldbe divided evenly in advance

The number of segments can be determined based on theelements of IO and the size of the audio In general assumingthat there are n elements in the IO then we will divide theaudio signal data into n segments as well Thus it can beensured that the whole IO can be completely embedded andbound with the audio

42 Relative Phase Invariance Through the early part welearnt that the human ear is very sensitive to relative phasechanges of audio signal but lacks the ability to perceiveand distinguish the absolute phase Therefore for a certainaudio signal the phase component is more important thanthe amplitude component and any change or damage tothe phase component will cause unacceptable distortionof the audio quality or even completely destroy the audioinformation At the same time the communication theoryalso declares that the phasemodulation has strong robustnessto the noise signal

More specifically by DFT transform the phase distri-bution of a segment of the audio signal is easy to obtainThere is only one maximum point and one minimum pointin the phase spectrum as shown in Figure 2 and their relativepositions are fixed (if there are multiple extreme points thefirst one prevails) Asmentioned above the relative phase willmaintain consistency regardless of any transformation Thischaracteristic offers the possibility to random embedding ofthe IO

43 Random Embedding Strategy We will propose a strat-egy to determine the embedding position according to theposition of the extreme points in the phase spectrum Sincethe extreme points are random in the phase spectrumthe choice of the embedding position controlled by themis also random Therefore this strategy will guarantee themutual balance between the transparency and robustnessas previously described Furthermore the characteristic ofrelative phase invariance in frequency spectrum is uniquebecause only the audio segment is involved which guaranteesthe self- detectability as well

An 8times8 coefficient matrix D can be obtained from aDCT transformation of an audio signal s(n) as shown inFigure 3Themost robust embedding method recommendedis embedding at Position (00) while the most transparentone is at Position (77) Embedding at Position (00) canimprove robustness but adds more perceptual distortionwhile embedding at Position (77) can improve impercepti-bility but offers less robustness

To get a good compromise between these parametersthe mid-band coefficients such as (22) (L) and (44) (H) are

0 00003 00006 00009 00012 00015 00018 00021 00024 00027 00030time (s)

0 00003 00006 00009 00012 00015 00018 00021 00024 00027 00030time (s)

radi

an (r

ad)

maxima phase spectrum

minima

37500

30000

22500

15000

7500

0

minus7500

minus15000

minus22500

minus30000

minus37500

ampl

itude

(kH

z)

4

3

2

1

0

minus1

minus2

minus3

minus4

Figure 2The phase spectrum of a segment of the audio signal Theaudio sampling frequency is 44100kHz and the length is 05 secondsThe length of the intercepted fragment is 0003 seconds containing120 sampling points

L

H

0 1 2 3 4 5 6 7

0

1

2

3

4

5

6

7

Figure 3 An 8times8 coefficient matrix of DCT Position L representsthe low frequency region and position H represents the highfrequency region which cannot be perceived by human ear

6 Security and Communication Networks

(1) receiving the key vaults as kv (i)(2) set the number of kv (i) as n(3) divide the audio signal equally into n segments as s (j)(4) sampling the s (j) to c (j)(5) forallc(j) initialize j=1(6) while end do(7) if jlen then(8) calculate the phase spectrum 120593(120596) of s(j)(9) determine the position of the maxima and minima in 120593(120596)(10) calculate the coefficient matrix C (j) by 8 times 8 DCT of c(j)(11) if the maxima is in front of the minima then(12) embedding the kv (j) at C (22)(13) else(14) embedding the kv (j) at C (44)(15) end if(16) end if(17) end while

Algorithm 1 The algorithm of random embedding strategy based on relative phase

START

Have all the elements ofIO been embedded

Calculate the phase spectrum ofthe audio segment by DFT

Determine the relative positionof the maxima and minima

Is the maxima in frontof the minimum

Embed the elementat L

Embed the elementat H

Y N

N

Y

END

Figure 4 The flow chart of the random embedding strategy Thechoice of the embedding position between L and H is controlled bythe relative phase relation of the host audio segment

selected as the embedding position instead Position L or His the Boolean choice for random embedding The so-calledBoolean choice means that a choice must be made from thesetwo positions and the selection process is random Basedon the relative phase relation discussed above a randomembedding strategy is proposed as follows The flow chartof this strategy is shown in Figure 4 and its algorithm isillustrated in Algorithm 1

(i) If the maximum point of the audio segment appearsbefore the minimum point the corresponding eigen-value will be embedded into the low frequency region(Position L)

(ii) Otherwise the corresponding eigenvalue will beembedded into the high frequency region (PositionH)

After determining the embedding position the next stepis to construct a fuzzy vault containing the secret sequenceOn the finite field F the secret sequence key isin F By anencrypting set P (P isin F) the key can be packaged into thevault of P to generate the IO of the sender

After embedding the IO into the audio a new DCTcoefficient matrix 1198631015840 is generated By making a DCT inversetransformation on1198631015840 we can obtain a new audio signal 1199041015840(119899)containing IO as well Finally 1199041015840(119899) is the transformed audiosignal that we exactly expect

44 Detection and Extraction The detection and extractionof IO can be achieved by the difference between D and 1198631015840When the receiver receives IO it is essentially a copy Qof the fuzzy vault P containing the secret sequence On thefinite field F by matching the Q (QisinF) and P the key can beparsed from the vault to obtain the IO of the receiver Afterthe IO are recovered the integrity of the audio signal canbe determined according to the completeness and accuracyof them furtherly Figure 5 shows the reconstruction anddetection process

In Figure 5 (a) shows the reconstruction process of thehost audio and 1199041015840(119899) is the reconstructed one which containsthe IO (b) is the detection and extraction process of IO119890V is the eigenvalues and 119864V is its DCT transformation 1198901015840V isthe extracted eigenvalues and 1198641015840V is its DCT transformationBy matching the similarity between 1198901015840V and 119890V the similaritybetween the received and original audio can be evaluatedAccording to the features of the extracted IO it can be found

Security and Communication Networks 7

Sampling DCT

DCT

Inverse

Reconstruction

Detection

DCT

DCT DCT

Inverse

s(i) s(n)

s(n)

D

D

D

D

s(n)

(a)

(b)

s(n)

E

E e

e

Figure 5 The process of the reconstruction and detection of theeigenvalues (a) is the reconstruction process and (b) is the detectionprocess

whether the audio is cut or tampered More details can befound in the experimental section

45 Integrity Evaluation After the IO is extracted thesimilarity of 119890V and 1198901015840V can be calculated by their cosinedistance d and the integrity of the host audio 119868au can beevaluated based on this as shown in (11) It can be seen thatwhen 120575 takes 1 119868au is a decimal within (0 1) The greater thevalue of 119868au the higher the similarity also the better the audiointegrity

119868119886119906 = 120575 sdot 119889 = 120575 sdot sum119873minus1119894=0 sum119873minus1119894=0 119890 (119894) 1198901015840 (119894)radicsum119873minus1119894=0 1198902 (119894) sdot radicsum119873minus1119894=0 11989010158402 (119894) (11)

where d is the cosine distance of 119890V and 1198901015840V 120575 is a weightcoefficient and the default value is 1 119890(119894) is the i-th elementof 119890V and 1198901015840(119894) is the i-th element of 1198901015840V5 Experiments and Evaluations

51 Audio Quality Criterion The signal to noise ratio (SNR)is the widely approved and used audio quality criterion forthe evaluation of audio signal transformation as shown in thefollowing equation

SNR = 20 logradicsum119873minus1119899=0 1199042 (119899)radicsum119873minus1119899=0 [119904 (119899) minus 1199041015840 (119899)]2

(12)

Figure 6 The black-and-white image of 100times100 pixels as IO inthese experiments

where 1199041015840(119899) is the transformation of 119904(119899)SNR = 20 log

radicsum119873minus1119899=0 1199042 (119899)radicsum119873minus1119899=0 [119904 (119899) minus 1199041015840 (119899)]2

Eq(6)Eq(7)997888997888997888997888997888997888997888997888997888rarr

= 20 log[[[radicsum119873minus1119899=0 1199042 (119899)radicsum119873minus1119899=0 1205762 (119894 119899)

]]]= 20 logradic sum119873minus1119899=0 1199042 (119899)sum119873minus1119899=0 1205762 (119894 119899)= 20 logradic119873minus1sum

119899=0

119873minus1sum119899=0

1199042 (119899)1205762 (119894 119899) Eq(9)997888997888997888997888rarr

= 20 logradic119873minus1sum119899=0

11205882 (119894)

(13)

Therefore after embedding the IO on the i-th coefficientof s(n) the SNR can be calculated according to (6) (7) and(9) as shown in (13)

From (11) we can see that the SNR is only related to 120588(119894)that is to say the audio quality after embedded eigenvalues isonly related to the embedding position of its DCT coefficienti This proves that the embedding position is crucial to theimperceptibility and robustness of the host audio in ourRP-AIP model Therefore the embedding rule based on therelative phase of the audio signal itself is of extraordinarysignificance

52 Embedding and Extraction of IO In this experiment itis assumed that the IO is a black-and-white image as shownin Figure 6 and the host audio is a set of audio clips ofthe left channel as shown in Figure 7 and Table 1 Firstlywe uniformly embedded the IO in the host audio and thenrecovered it in the opposite way

According to (12) the SNR of different audio types iscalculated as shown in Table 1 which is in line with therequirements of robustness and imperceptibility

53 Integrity Protection Performance In the transmissionprocess the audio may suffer from network congestionaccidental packet loss and other accidents caused by network

8 Security and Communication Networks

Table 1 The SNR of different audio types after embedding IO

Audio clipsA1 A2 A3

Audio types A segment of voice message A prompt tone of Windows 10 A piece of musicSNR 3096 4233 4640

norm

aliz

ed am

plitu

de

0 02 04 06 08 10

0 02 04 06 08 10

time (s)

0 02 04 06 08 10

1

1

1

05

05

05

minus05

minus05

minus05

minus1

minus1

minus1

0

0

0

A3)(

A2)(

A1)(

Figure 7 The host audio clips in the experiment The audiosampling frequency is 22050 kHz and the length is 10 seconds (A1)a segment of voice message (A2) a prompt tone of Windows 10(A3) a piece of music

anomalies even filtering compression and AD conversionTherefore the first thing to consider is whether the model caneffectively distinguish between such accidents and maliciousattacks such as cutting and tampering As is stated above theloss and damage of IO caused by the transmission anomaliesare generally accidental and random but deterministic anddirectional caused by malicious attacks Thus this issue canbe judged from the recovered IO

531 Audio Processing In this experiment we simulated fivecommon audio processing methods smoothinglow-pass fil-tering (with 4kHz cutoff frequency) band-pass filtering (with200-2kHz cutoff frequency) MPEG-1 (with compressionratio of 1051 and 121) compression and AD conversionThe evaluation of audio signal distortion is indicated by thesimilarity between the original and recovered IO as well

Figure 8 shows the simulation results Among them (a)is the recovered IO after a smoothinglow-pass filtering (b)is the one after a band-pass filtering (c) is the one after aMPEG-1 (with compression ratio of 1051) compression (d)

(a) (b) (c)

(d) (e)

Figure 8 The simulation results of the recovered IO of differentaudio processing methods

is the one after a MPEG-1 (with compression ratio of 121)compression and (e) is the one after an AD conversion Thesimilarity between the original and recovered IO and thecomparison with the other two common audio embeddingmethods (embedding the IO on 0times0 or 119899 times 119899 coefficientwhich is called 0-coefficient or n-coefficient embedding forshort) can be found in Table 2 and presented by Figure 9

It can be seen from Figure 8 and Table 2 that the RP-AIPmodel can well adapt to the common audio processing thatis to say the general audio processing method will not lead tothe destruction and loss of IO It can be seen from Figure 9that the RP-AIP model has similar robustness against thecommon audio processing compared with the 0-coefficientembedding model In addition as far as the smoothinglow-pass and band-pass filtering methods are concerned theperformance of band-pass filtering of RP-AIP is superior toother models significantly thanks to the method of randomembedding in the mid-band coefficient In general RP-AIPexhibits sufficient robustness for general audio processingmethods

532 Malicious Attack In this experiment we simulatedthree kinds of malicious attacks as cutting tampering andresampling Moreover in the type of tampering it canbe divided into self-media tampering and non-self-mediatampering self-media tampering refers to using the mediaitself for shifting swapping and so forth while non-self-media tampering refers to using other media for replacementcoverage and so forth Resampling can also be dividedinto upsampling and downsampling upsampling refers to

Security and Communication Networks 9

Table 2 The similarity 120588 between the original and recovered IO in audio processing experiments

No Processing methodsSimilarity120588

Random embedding (RP-AIP) 0-coefficient embedding n-coefficient embedding(a) Original 100 100 100(b) Without processing 097 099 095(c) Smoothinglow-pass filtering 088 098 018(d) Band-pass filtering 090 085 084(e) MPEG-1 (with 1051 ratio) 079 088 068(f) MPEG-1 (with 121 ratio) 081 085 064(g) AD conversion 075 074 076

Processing methods

0

05

1

15

2

25

3

RP-AIP0-coefficientn-coefficient

095

099

097

018

098

088 090

085

084068

088

079 081

085

064 076

074

075

W LP BP MP(i) MP(ii) AD

Figure 9 The comparison of the simulation results of differentaudio processing and embedding methods W without processingLP smoothinglow-pass filtering BP band-pass filtering MP (i)MPEG-1 compression (with compression ratio of 1051) MP (ii)MPEG-1 compression (with compression ratio of 121) AD ADconversion

interpolation expansion of the original audio while down-sampling refers to the extraction of the original audio Cuttingand tampering are undoubtedly malicious attacks on audioand attempt to change the speakerrsquos intention Howeverresampling attacks often only affect the length or quality ofthe audio especially the upsampling but cannot change thespeakerrsquos intention

Figure 10 shows the simulation results Among them (f)is the recovered IO after a cutting attack (g) is the one aftera self-media tampering attack (h) is the one after a non-self-media tampering attack (i) is the one after an upsamplingattack and (j) is the one after a downsampling attack Thesimilarity between the original and recovered IO and thecomparison with the other two common audio embeddingmethods as mentioned above can be found in Table 3 andpresented by Figure 11

As can be seen from Figure 10 and Table 3 in additionto the upsampling attack the RP-AIP model can effectively

(f)(g)(f) (h)

(i) (j)

Figure 10 The simulation results of the recovered IO of differentmalicious attacks

Attack types

0

05

1

15

2

25

3

097

099

095

050

044

031

062

046

024

057

033019

092

098

079022

042

020

RP-AIP0-coefficientn-coefficient

W CT TP(i) TP(ii) RS(i) RS(ii)

Figure 11 The comparison of the simulation results of differentmalicious attacks and embedding methods W without processingCT cutting attack TP (i) self-media tampering attack TP (ii)non-self-media tampering attack RS (i) upsampling attack RS (ii)downsampling attack

10 Security and Communication Networks

Table 3 The similarity 120588 between the original and recovered IO in malicious attack experiments

No Attack typesSimilarity120588

Random embedding (RP-AIP) 0-coefficient embedding n-coefficient embedding(a) Original 100 100 100(b) Without attacks 097 099 095(h) Cutting 031 044 050(i) Tampering (self-media) 024 046 062(j) Tampering (non-self-media) 019 033 057(k) Resampling (upsampling) 097 098 092(l) Resampling (downsampling) 020 042 022

detect and reflect malicious attacks As stated earlier the IOis the tangible representation form of the audio integritythat is changes in IO directly reflect changes in audiomedia Therefore we can evaluate the type and degree ofthe attack on the audio by observing IO In the case of (f)since cutting will directly lead to the loss of IO when theIO is reconstructed by 100times100 pixels it will become anincomplete image The missing part of IO indicates that thecorresponding audio part has been cut off In the same waythe tampered part appears as random noise or disorderedas shown in (g) and (h) because the audio has suffered thecorresponding tampering attack However for resamplingattacks the RP-AIP shows different results as shown in (i)and (j) This is because the upsampling tends to interpolatethe original audio signal without causing loss of IO whiledownsampling is the extraction of the original audio signalwhich directly leads to the loss and destruction of IO Asstated in the Introduction we are more concerned aboutwhether the attack has tampered with the speakerrsquos intentionbut resampling is generally not

It can be seen from Figure 11 that the RP-AIP modelhas definite better performance than the other two modelsThis is because attacks such as cutting and tampering aregenerally oriented to the audio signalrsquos time domain ratherthan the frequency domain so the damage to the low andhigh frequency parts is random The embedding methodwith only consideration of low or high frequency can onlyguarantee one aspect while the random embedding methodin RP-AIP model can synthesize these two frequency partsThis is also the innovation and highlight of our work

6 Conclusion

Audio integrity protection for network interaction envi-ronment is not only an effective way to protect audioinformation but also an important link in building trustedcommunication Due to real-time constraints of interactionscenarios this kind of audio protection needs to reconsidersome new characteristics such as streaming characteristicimperceptibility self-detectability and distortion tolerancewhich are not covered by the traditional programsThis posesnew challenges to the existing audio protection solutions

To this end a method of audio integrity protection basedon relative phase (RP-AIP) is proposed in this work In

the RP-AIP model we established the concept of integrityobject (IO) in order to propose and abstract the integrityof the audio signal and then transform it into a tangiblerepresentation form In addition we also fully consideredthe characteristics of the audio signal in the DFT and DCTtransform domain and used the frequency characteristicsof the audio itself to guide the random embedding of IOthereby achieving blind detection and extraction of it Thetangible expression of the audio integrity and the relativephase based random embedding scheme are the highlight ofour work

The simulation experiments show that the RP-AIPmodelcan guarantee the SNRof the audio signal and the embeddingof IO will not cause signal distortion In addition it hasgood adaptability to some common audio processing suchas smoothinglow-pass filtering band-pass filtering MPEG-1 compression with different ratios and AD conversionHowever aiming at malicious attacks such as cutting tam-pering and resampling the RP-AIP model shows obvioussuperiority to other similar protection solutions Of coursethe research about audio integrity protection is still in itsinfancy and more work will be done to make it morecomplete

Data Availability

Some of the data is uploaded to the public server httpsdoiorg106084m9figshare6382709v1 In the file you canfind the demo audio and its spectrum data the IO data andthe audio data used in the experiment

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work is supported by the National Natural ScienceFoundation of China [No 61671141 No 61401081] the Liaon-ing Provincial Natural Science Foundation of China [No20180551007] and the Ministry of Education-China MobileScientific Research Funds [No MCM20150103]

Security and Communication Networks 11

References

[1] R F Olanrewaju and O Khalifa ldquoDigital audio watermark-ing techniques and applicationsrdquo in Proceedings of the 2012International Conference on Computer and CommunicationEngineering ICCCE 2012 pp 830ndash835 Malaysia July 2012

[2] J Seok J Hong and J Kim ldquoA novel audio watermarkingalgorithm for copyright protection of digital audiordquo ETRIJournal vol 24 no 3 pp 181ndash189 2002

[3] H Yassine B Bachir and K Aziz ldquoA Secure and HighRobust AudioWatermarking System for Copyright ProtectionrdquoInternational Journal of Computer Applications vol 53 no 17pp 33ndash39 2012

[4] J Haitsma M van der Veen T Kalker and F BruekersldquoAudio watermarking for monitoring and copy protectionrdquo inProceedings of the ACM Workshops on Multimedia ACM pp119ndash122 2000

[5] R D Shelke and M U Nemade ldquoAudio watermarking tech-niques for copyright protection A reviewrdquo in Proceedings ofthe 2016 International Conference on Global Trends in SignalProcessing Information Computing and Communication ICGT-SPICC 2016 pp 634ndash640 December 2016

[6] S V Dhavale R S Deodhar and L M Patnaik ldquoHighCapacity Lossless Semi-fragile Audio Watermarking in theTimeDomainrdquo inAhierarchical CPNmodel formobility analysisin zone based MANET pp 843ndash852 2012

[7] S V Dhavale R S Deodhar D Pradhan and L M PatnaikldquoState Transition Based Embedding in Cepstrum Domain forAudio Copyright Protectionrdquo IETE Journal of Research vol 61no 1 pp 41ndash55 2015

[8] HH Tsai J S Cheng and P T Yu ldquoAudioWatermarkingBasedonHAS andNeuralNetworks inDCTDomainrdquoEurasip Journalon Advances in Signal Processing vol 3 no 2 pp 1ndash12 2003

[9] Y Xiang I Natgunanathan Y Rong and S Guo ldquoSpreadspectrum-based high embedding capacity watermarkingmethod for audio signalsrdquo IEEE Transactions on Audio Speechand Language Processing vol 23 no 12 pp 2228ndash2237 2015

[10] X Huang A Nishimura and I Echizen ldquoA Reversible AcousticSteganography for Integrity Verificationrdquo inDigital Watermark-ing p 305 Springer Berlin Heidelberg Germany 2011

[11] H Cao Y Wang J Li et al ldquoAudio data hiding using perceptualmasking effect of HASrdquo in Proceedings of the InternationalSymposium on Multispectral Image Processing and PatternRecognition International Society for Optics and Photonics 2007

[12] T Painter and A Spanias ldquoPerceptual coding of digital audiordquoProceedings of the IEEE vol 88 no 4 pp 451ndash512 2000

[13] S V Dhavale R S Deodhar D Pradhan et al ldquoRobustMultiple Stereo Audio Watermarking for Copyright Protectionand Integrity Checkingrdquo in Proceedings of the InternationalConference on Computational Intelligence and Information Tech-nology IET pp 9ndash16 2014

[14] L Boney A H Tewfik and K N Hamdy ldquoDigital watermarksfor audio signalsrdquo in Proceedings of the IEEE InternationalConference onMultimedia Computing and Systems pp 473ndash480IEEE 2002

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 6: A Relative Phase Based Audio Integrity Protection Method: Model …downloads.hindawi.com/journals/scn/2018/7139391.pdf · 2019-07-30 · ResearchArticle A Relative Phase Based Audio

6 Security and Communication Networks

(1) receiving the key vaults as kv (i)(2) set the number of kv (i) as n(3) divide the audio signal equally into n segments as s (j)(4) sampling the s (j) to c (j)(5) forallc(j) initialize j=1(6) while end do(7) if jlen then(8) calculate the phase spectrum 120593(120596) of s(j)(9) determine the position of the maxima and minima in 120593(120596)(10) calculate the coefficient matrix C (j) by 8 times 8 DCT of c(j)(11) if the maxima is in front of the minima then(12) embedding the kv (j) at C (22)(13) else(14) embedding the kv (j) at C (44)(15) end if(16) end if(17) end while

Algorithm 1 The algorithm of random embedding strategy based on relative phase

START

Have all the elements ofIO been embedded

Calculate the phase spectrum ofthe audio segment by DFT

Determine the relative positionof the maxima and minima

Is the maxima in frontof the minimum

Embed the elementat L

Embed the elementat H

Y N

N

Y

END

Figure 4 The flow chart of the random embedding strategy Thechoice of the embedding position between L and H is controlled bythe relative phase relation of the host audio segment

selected as the embedding position instead Position L or His the Boolean choice for random embedding The so-calledBoolean choice means that a choice must be made from thesetwo positions and the selection process is random Basedon the relative phase relation discussed above a randomembedding strategy is proposed as follows The flow chartof this strategy is shown in Figure 4 and its algorithm isillustrated in Algorithm 1

(i) If the maximum point of the audio segment appearsbefore the minimum point the corresponding eigen-value will be embedded into the low frequency region(Position L)

(ii) Otherwise the corresponding eigenvalue will beembedded into the high frequency region (PositionH)

After determining the embedding position the next stepis to construct a fuzzy vault containing the secret sequenceOn the finite field F the secret sequence key isin F By anencrypting set P (P isin F) the key can be packaged into thevault of P to generate the IO of the sender

After embedding the IO into the audio a new DCTcoefficient matrix 1198631015840 is generated By making a DCT inversetransformation on1198631015840 we can obtain a new audio signal 1199041015840(119899)containing IO as well Finally 1199041015840(119899) is the transformed audiosignal that we exactly expect

44 Detection and Extraction The detection and extractionof IO can be achieved by the difference between D and 1198631015840When the receiver receives IO it is essentially a copy Qof the fuzzy vault P containing the secret sequence On thefinite field F by matching the Q (QisinF) and P the key can beparsed from the vault to obtain the IO of the receiver Afterthe IO are recovered the integrity of the audio signal canbe determined according to the completeness and accuracyof them furtherly Figure 5 shows the reconstruction anddetection process

In Figure 5 (a) shows the reconstruction process of thehost audio and 1199041015840(119899) is the reconstructed one which containsthe IO (b) is the detection and extraction process of IO119890V is the eigenvalues and 119864V is its DCT transformation 1198901015840V isthe extracted eigenvalues and 1198641015840V is its DCT transformationBy matching the similarity between 1198901015840V and 119890V the similaritybetween the received and original audio can be evaluatedAccording to the features of the extracted IO it can be found

Security and Communication Networks 7

Sampling DCT

DCT

Inverse

Reconstruction

Detection

DCT

DCT DCT

Inverse

s(i) s(n)

s(n)

D

D

D

D

s(n)

(a)

(b)

s(n)

E

E e

e

Figure 5 The process of the reconstruction and detection of theeigenvalues (a) is the reconstruction process and (b) is the detectionprocess

whether the audio is cut or tampered More details can befound in the experimental section

45 Integrity Evaluation After the IO is extracted thesimilarity of 119890V and 1198901015840V can be calculated by their cosinedistance d and the integrity of the host audio 119868au can beevaluated based on this as shown in (11) It can be seen thatwhen 120575 takes 1 119868au is a decimal within (0 1) The greater thevalue of 119868au the higher the similarity also the better the audiointegrity

119868119886119906 = 120575 sdot 119889 = 120575 sdot sum119873minus1119894=0 sum119873minus1119894=0 119890 (119894) 1198901015840 (119894)radicsum119873minus1119894=0 1198902 (119894) sdot radicsum119873minus1119894=0 11989010158402 (119894) (11)

where d is the cosine distance of 119890V and 1198901015840V 120575 is a weightcoefficient and the default value is 1 119890(119894) is the i-th elementof 119890V and 1198901015840(119894) is the i-th element of 1198901015840V5 Experiments and Evaluations

51 Audio Quality Criterion The signal to noise ratio (SNR)is the widely approved and used audio quality criterion forthe evaluation of audio signal transformation as shown in thefollowing equation

SNR = 20 logradicsum119873minus1119899=0 1199042 (119899)radicsum119873minus1119899=0 [119904 (119899) minus 1199041015840 (119899)]2

(12)

Figure 6 The black-and-white image of 100times100 pixels as IO inthese experiments

where 1199041015840(119899) is the transformation of 119904(119899)SNR = 20 log

radicsum119873minus1119899=0 1199042 (119899)radicsum119873minus1119899=0 [119904 (119899) minus 1199041015840 (119899)]2

Eq(6)Eq(7)997888997888997888997888997888997888997888997888997888rarr

= 20 log[[[radicsum119873minus1119899=0 1199042 (119899)radicsum119873minus1119899=0 1205762 (119894 119899)

]]]= 20 logradic sum119873minus1119899=0 1199042 (119899)sum119873minus1119899=0 1205762 (119894 119899)= 20 logradic119873minus1sum

119899=0

119873minus1sum119899=0

1199042 (119899)1205762 (119894 119899) Eq(9)997888997888997888997888rarr

= 20 logradic119873minus1sum119899=0

11205882 (119894)

(13)

Therefore after embedding the IO on the i-th coefficientof s(n) the SNR can be calculated according to (6) (7) and(9) as shown in (13)

From (11) we can see that the SNR is only related to 120588(119894)that is to say the audio quality after embedded eigenvalues isonly related to the embedding position of its DCT coefficienti This proves that the embedding position is crucial to theimperceptibility and robustness of the host audio in ourRP-AIP model Therefore the embedding rule based on therelative phase of the audio signal itself is of extraordinarysignificance

52 Embedding and Extraction of IO In this experiment itis assumed that the IO is a black-and-white image as shownin Figure 6 and the host audio is a set of audio clips ofthe left channel as shown in Figure 7 and Table 1 Firstlywe uniformly embedded the IO in the host audio and thenrecovered it in the opposite way

According to (12) the SNR of different audio types iscalculated as shown in Table 1 which is in line with therequirements of robustness and imperceptibility

53 Integrity Protection Performance In the transmissionprocess the audio may suffer from network congestionaccidental packet loss and other accidents caused by network

8 Security and Communication Networks

Table 1 The SNR of different audio types after embedding IO

Audio clipsA1 A2 A3

Audio types A segment of voice message A prompt tone of Windows 10 A piece of musicSNR 3096 4233 4640

norm

aliz

ed am

plitu

de

0 02 04 06 08 10

0 02 04 06 08 10

time (s)

0 02 04 06 08 10

1

1

1

05

05

05

minus05

minus05

minus05

minus1

minus1

minus1

0

0

0

A3)(

A2)(

A1)(

Figure 7 The host audio clips in the experiment The audiosampling frequency is 22050 kHz and the length is 10 seconds (A1)a segment of voice message (A2) a prompt tone of Windows 10(A3) a piece of music

anomalies even filtering compression and AD conversionTherefore the first thing to consider is whether the model caneffectively distinguish between such accidents and maliciousattacks such as cutting and tampering As is stated above theloss and damage of IO caused by the transmission anomaliesare generally accidental and random but deterministic anddirectional caused by malicious attacks Thus this issue canbe judged from the recovered IO

531 Audio Processing In this experiment we simulated fivecommon audio processing methods smoothinglow-pass fil-tering (with 4kHz cutoff frequency) band-pass filtering (with200-2kHz cutoff frequency) MPEG-1 (with compressionratio of 1051 and 121) compression and AD conversionThe evaluation of audio signal distortion is indicated by thesimilarity between the original and recovered IO as well

Figure 8 shows the simulation results Among them (a)is the recovered IO after a smoothinglow-pass filtering (b)is the one after a band-pass filtering (c) is the one after aMPEG-1 (with compression ratio of 1051) compression (d)

(a) (b) (c)

(d) (e)

Figure 8 The simulation results of the recovered IO of differentaudio processing methods

is the one after a MPEG-1 (with compression ratio of 121)compression and (e) is the one after an AD conversion Thesimilarity between the original and recovered IO and thecomparison with the other two common audio embeddingmethods (embedding the IO on 0times0 or 119899 times 119899 coefficientwhich is called 0-coefficient or n-coefficient embedding forshort) can be found in Table 2 and presented by Figure 9

It can be seen from Figure 8 and Table 2 that the RP-AIPmodel can well adapt to the common audio processing thatis to say the general audio processing method will not lead tothe destruction and loss of IO It can be seen from Figure 9that the RP-AIP model has similar robustness against thecommon audio processing compared with the 0-coefficientembedding model In addition as far as the smoothinglow-pass and band-pass filtering methods are concerned theperformance of band-pass filtering of RP-AIP is superior toother models significantly thanks to the method of randomembedding in the mid-band coefficient In general RP-AIPexhibits sufficient robustness for general audio processingmethods

532 Malicious Attack In this experiment we simulatedthree kinds of malicious attacks as cutting tampering andresampling Moreover in the type of tampering it canbe divided into self-media tampering and non-self-mediatampering self-media tampering refers to using the mediaitself for shifting swapping and so forth while non-self-media tampering refers to using other media for replacementcoverage and so forth Resampling can also be dividedinto upsampling and downsampling upsampling refers to

Security and Communication Networks 9

Table 2 The similarity 120588 between the original and recovered IO in audio processing experiments

No Processing methodsSimilarity120588

Random embedding (RP-AIP) 0-coefficient embedding n-coefficient embedding(a) Original 100 100 100(b) Without processing 097 099 095(c) Smoothinglow-pass filtering 088 098 018(d) Band-pass filtering 090 085 084(e) MPEG-1 (with 1051 ratio) 079 088 068(f) MPEG-1 (with 121 ratio) 081 085 064(g) AD conversion 075 074 076

Processing methods

0

05

1

15

2

25

3

RP-AIP0-coefficientn-coefficient

095

099

097

018

098

088 090

085

084068

088

079 081

085

064 076

074

075

W LP BP MP(i) MP(ii) AD

Figure 9 The comparison of the simulation results of differentaudio processing and embedding methods W without processingLP smoothinglow-pass filtering BP band-pass filtering MP (i)MPEG-1 compression (with compression ratio of 1051) MP (ii)MPEG-1 compression (with compression ratio of 121) AD ADconversion

interpolation expansion of the original audio while down-sampling refers to the extraction of the original audio Cuttingand tampering are undoubtedly malicious attacks on audioand attempt to change the speakerrsquos intention Howeverresampling attacks often only affect the length or quality ofthe audio especially the upsampling but cannot change thespeakerrsquos intention

Figure 10 shows the simulation results Among them (f)is the recovered IO after a cutting attack (g) is the one aftera self-media tampering attack (h) is the one after a non-self-media tampering attack (i) is the one after an upsamplingattack and (j) is the one after a downsampling attack Thesimilarity between the original and recovered IO and thecomparison with the other two common audio embeddingmethods as mentioned above can be found in Table 3 andpresented by Figure 11

As can be seen from Figure 10 and Table 3 in additionto the upsampling attack the RP-AIP model can effectively

(f)(g)(f) (h)

(i) (j)

Figure 10 The simulation results of the recovered IO of differentmalicious attacks

Attack types

0

05

1

15

2

25

3

097

099

095

050

044

031

062

046

024

057

033019

092

098

079022

042

020

RP-AIP0-coefficientn-coefficient

W CT TP(i) TP(ii) RS(i) RS(ii)

Figure 11 The comparison of the simulation results of differentmalicious attacks and embedding methods W without processingCT cutting attack TP (i) self-media tampering attack TP (ii)non-self-media tampering attack RS (i) upsampling attack RS (ii)downsampling attack

10 Security and Communication Networks

Table 3 The similarity 120588 between the original and recovered IO in malicious attack experiments

No Attack typesSimilarity120588

Random embedding (RP-AIP) 0-coefficient embedding n-coefficient embedding(a) Original 100 100 100(b) Without attacks 097 099 095(h) Cutting 031 044 050(i) Tampering (self-media) 024 046 062(j) Tampering (non-self-media) 019 033 057(k) Resampling (upsampling) 097 098 092(l) Resampling (downsampling) 020 042 022

detect and reflect malicious attacks As stated earlier the IOis the tangible representation form of the audio integritythat is changes in IO directly reflect changes in audiomedia Therefore we can evaluate the type and degree ofthe attack on the audio by observing IO In the case of (f)since cutting will directly lead to the loss of IO when theIO is reconstructed by 100times100 pixels it will become anincomplete image The missing part of IO indicates that thecorresponding audio part has been cut off In the same waythe tampered part appears as random noise or disorderedas shown in (g) and (h) because the audio has suffered thecorresponding tampering attack However for resamplingattacks the RP-AIP shows different results as shown in (i)and (j) This is because the upsampling tends to interpolatethe original audio signal without causing loss of IO whiledownsampling is the extraction of the original audio signalwhich directly leads to the loss and destruction of IO Asstated in the Introduction we are more concerned aboutwhether the attack has tampered with the speakerrsquos intentionbut resampling is generally not

It can be seen from Figure 11 that the RP-AIP modelhas definite better performance than the other two modelsThis is because attacks such as cutting and tampering aregenerally oriented to the audio signalrsquos time domain ratherthan the frequency domain so the damage to the low andhigh frequency parts is random The embedding methodwith only consideration of low or high frequency can onlyguarantee one aspect while the random embedding methodin RP-AIP model can synthesize these two frequency partsThis is also the innovation and highlight of our work

6 Conclusion

Audio integrity protection for network interaction envi-ronment is not only an effective way to protect audioinformation but also an important link in building trustedcommunication Due to real-time constraints of interactionscenarios this kind of audio protection needs to reconsidersome new characteristics such as streaming characteristicimperceptibility self-detectability and distortion tolerancewhich are not covered by the traditional programsThis posesnew challenges to the existing audio protection solutions

To this end a method of audio integrity protection basedon relative phase (RP-AIP) is proposed in this work In

the RP-AIP model we established the concept of integrityobject (IO) in order to propose and abstract the integrityof the audio signal and then transform it into a tangiblerepresentation form In addition we also fully consideredthe characteristics of the audio signal in the DFT and DCTtransform domain and used the frequency characteristicsof the audio itself to guide the random embedding of IOthereby achieving blind detection and extraction of it Thetangible expression of the audio integrity and the relativephase based random embedding scheme are the highlight ofour work

The simulation experiments show that the RP-AIPmodelcan guarantee the SNRof the audio signal and the embeddingof IO will not cause signal distortion In addition it hasgood adaptability to some common audio processing suchas smoothinglow-pass filtering band-pass filtering MPEG-1 compression with different ratios and AD conversionHowever aiming at malicious attacks such as cutting tam-pering and resampling the RP-AIP model shows obvioussuperiority to other similar protection solutions Of coursethe research about audio integrity protection is still in itsinfancy and more work will be done to make it morecomplete

Data Availability

Some of the data is uploaded to the public server httpsdoiorg106084m9figshare6382709v1 In the file you canfind the demo audio and its spectrum data the IO data andthe audio data used in the experiment

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work is supported by the National Natural ScienceFoundation of China [No 61671141 No 61401081] the Liaon-ing Provincial Natural Science Foundation of China [No20180551007] and the Ministry of Education-China MobileScientific Research Funds [No MCM20150103]

Security and Communication Networks 11

References

[1] R F Olanrewaju and O Khalifa ldquoDigital audio watermark-ing techniques and applicationsrdquo in Proceedings of the 2012International Conference on Computer and CommunicationEngineering ICCCE 2012 pp 830ndash835 Malaysia July 2012

[2] J Seok J Hong and J Kim ldquoA novel audio watermarkingalgorithm for copyright protection of digital audiordquo ETRIJournal vol 24 no 3 pp 181ndash189 2002

[3] H Yassine B Bachir and K Aziz ldquoA Secure and HighRobust AudioWatermarking System for Copyright ProtectionrdquoInternational Journal of Computer Applications vol 53 no 17pp 33ndash39 2012

[4] J Haitsma M van der Veen T Kalker and F BruekersldquoAudio watermarking for monitoring and copy protectionrdquo inProceedings of the ACM Workshops on Multimedia ACM pp119ndash122 2000

[5] R D Shelke and M U Nemade ldquoAudio watermarking tech-niques for copyright protection A reviewrdquo in Proceedings ofthe 2016 International Conference on Global Trends in SignalProcessing Information Computing and Communication ICGT-SPICC 2016 pp 634ndash640 December 2016

[6] S V Dhavale R S Deodhar and L M Patnaik ldquoHighCapacity Lossless Semi-fragile Audio Watermarking in theTimeDomainrdquo inAhierarchical CPNmodel formobility analysisin zone based MANET pp 843ndash852 2012

[7] S V Dhavale R S Deodhar D Pradhan and L M PatnaikldquoState Transition Based Embedding in Cepstrum Domain forAudio Copyright Protectionrdquo IETE Journal of Research vol 61no 1 pp 41ndash55 2015

[8] HH Tsai J S Cheng and P T Yu ldquoAudioWatermarkingBasedonHAS andNeuralNetworks inDCTDomainrdquoEurasip Journalon Advances in Signal Processing vol 3 no 2 pp 1ndash12 2003

[9] Y Xiang I Natgunanathan Y Rong and S Guo ldquoSpreadspectrum-based high embedding capacity watermarkingmethod for audio signalsrdquo IEEE Transactions on Audio Speechand Language Processing vol 23 no 12 pp 2228ndash2237 2015

[10] X Huang A Nishimura and I Echizen ldquoA Reversible AcousticSteganography for Integrity Verificationrdquo inDigital Watermark-ing p 305 Springer Berlin Heidelberg Germany 2011

[11] H Cao Y Wang J Li et al ldquoAudio data hiding using perceptualmasking effect of HASrdquo in Proceedings of the InternationalSymposium on Multispectral Image Processing and PatternRecognition International Society for Optics and Photonics 2007

[12] T Painter and A Spanias ldquoPerceptual coding of digital audiordquoProceedings of the IEEE vol 88 no 4 pp 451ndash512 2000

[13] S V Dhavale R S Deodhar D Pradhan et al ldquoRobustMultiple Stereo Audio Watermarking for Copyright Protectionand Integrity Checkingrdquo in Proceedings of the InternationalConference on Computational Intelligence and Information Tech-nology IET pp 9ndash16 2014

[14] L Boney A H Tewfik and K N Hamdy ldquoDigital watermarksfor audio signalsrdquo in Proceedings of the IEEE InternationalConference onMultimedia Computing and Systems pp 473ndash480IEEE 2002

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 7: A Relative Phase Based Audio Integrity Protection Method: Model …downloads.hindawi.com/journals/scn/2018/7139391.pdf · 2019-07-30 · ResearchArticle A Relative Phase Based Audio

Security and Communication Networks 7

Sampling DCT

DCT

Inverse

Reconstruction

Detection

DCT

DCT DCT

Inverse

s(i) s(n)

s(n)

D

D

D

D

s(n)

(a)

(b)

s(n)

E

E e

e

Figure 5 The process of the reconstruction and detection of theeigenvalues (a) is the reconstruction process and (b) is the detectionprocess

whether the audio is cut or tampered More details can befound in the experimental section

45 Integrity Evaluation After the IO is extracted thesimilarity of 119890V and 1198901015840V can be calculated by their cosinedistance d and the integrity of the host audio 119868au can beevaluated based on this as shown in (11) It can be seen thatwhen 120575 takes 1 119868au is a decimal within (0 1) The greater thevalue of 119868au the higher the similarity also the better the audiointegrity

119868119886119906 = 120575 sdot 119889 = 120575 sdot sum119873minus1119894=0 sum119873minus1119894=0 119890 (119894) 1198901015840 (119894)radicsum119873minus1119894=0 1198902 (119894) sdot radicsum119873minus1119894=0 11989010158402 (119894) (11)

where d is the cosine distance of 119890V and 1198901015840V 120575 is a weightcoefficient and the default value is 1 119890(119894) is the i-th elementof 119890V and 1198901015840(119894) is the i-th element of 1198901015840V5 Experiments and Evaluations

51 Audio Quality Criterion The signal to noise ratio (SNR)is the widely approved and used audio quality criterion forthe evaluation of audio signal transformation as shown in thefollowing equation

SNR = 20 logradicsum119873minus1119899=0 1199042 (119899)radicsum119873minus1119899=0 [119904 (119899) minus 1199041015840 (119899)]2

(12)

Figure 6 The black-and-white image of 100times100 pixels as IO inthese experiments

where 1199041015840(119899) is the transformation of 119904(119899)SNR = 20 log

radicsum119873minus1119899=0 1199042 (119899)radicsum119873minus1119899=0 [119904 (119899) minus 1199041015840 (119899)]2

Eq(6)Eq(7)997888997888997888997888997888997888997888997888997888rarr

= 20 log[[[radicsum119873minus1119899=0 1199042 (119899)radicsum119873minus1119899=0 1205762 (119894 119899)

]]]= 20 logradic sum119873minus1119899=0 1199042 (119899)sum119873minus1119899=0 1205762 (119894 119899)= 20 logradic119873minus1sum

119899=0

119873minus1sum119899=0

1199042 (119899)1205762 (119894 119899) Eq(9)997888997888997888997888rarr

= 20 logradic119873minus1sum119899=0

11205882 (119894)

(13)

Therefore after embedding the IO on the i-th coefficientof s(n) the SNR can be calculated according to (6) (7) and(9) as shown in (13)

From (11) we can see that the SNR is only related to 120588(119894)that is to say the audio quality after embedded eigenvalues isonly related to the embedding position of its DCT coefficienti This proves that the embedding position is crucial to theimperceptibility and robustness of the host audio in ourRP-AIP model Therefore the embedding rule based on therelative phase of the audio signal itself is of extraordinarysignificance

52 Embedding and Extraction of IO In this experiment itis assumed that the IO is a black-and-white image as shownin Figure 6 and the host audio is a set of audio clips ofthe left channel as shown in Figure 7 and Table 1 Firstlywe uniformly embedded the IO in the host audio and thenrecovered it in the opposite way

According to (12) the SNR of different audio types iscalculated as shown in Table 1 which is in line with therequirements of robustness and imperceptibility

53 Integrity Protection Performance In the transmissionprocess the audio may suffer from network congestionaccidental packet loss and other accidents caused by network

8 Security and Communication Networks

Table 1 The SNR of different audio types after embedding IO

Audio clipsA1 A2 A3

Audio types A segment of voice message A prompt tone of Windows 10 A piece of musicSNR 3096 4233 4640

norm

aliz

ed am

plitu

de

0 02 04 06 08 10

0 02 04 06 08 10

time (s)

0 02 04 06 08 10

1

1

1

05

05

05

minus05

minus05

minus05

minus1

minus1

minus1

0

0

0

A3)(

A2)(

A1)(

Figure 7 The host audio clips in the experiment The audiosampling frequency is 22050 kHz and the length is 10 seconds (A1)a segment of voice message (A2) a prompt tone of Windows 10(A3) a piece of music

anomalies even filtering compression and AD conversionTherefore the first thing to consider is whether the model caneffectively distinguish between such accidents and maliciousattacks such as cutting and tampering As is stated above theloss and damage of IO caused by the transmission anomaliesare generally accidental and random but deterministic anddirectional caused by malicious attacks Thus this issue canbe judged from the recovered IO

531 Audio Processing In this experiment we simulated fivecommon audio processing methods smoothinglow-pass fil-tering (with 4kHz cutoff frequency) band-pass filtering (with200-2kHz cutoff frequency) MPEG-1 (with compressionratio of 1051 and 121) compression and AD conversionThe evaluation of audio signal distortion is indicated by thesimilarity between the original and recovered IO as well

Figure 8 shows the simulation results Among them (a)is the recovered IO after a smoothinglow-pass filtering (b)is the one after a band-pass filtering (c) is the one after aMPEG-1 (with compression ratio of 1051) compression (d)

(a) (b) (c)

(d) (e)

Figure 8 The simulation results of the recovered IO of differentaudio processing methods

is the one after a MPEG-1 (with compression ratio of 121)compression and (e) is the one after an AD conversion Thesimilarity between the original and recovered IO and thecomparison with the other two common audio embeddingmethods (embedding the IO on 0times0 or 119899 times 119899 coefficientwhich is called 0-coefficient or n-coefficient embedding forshort) can be found in Table 2 and presented by Figure 9

It can be seen from Figure 8 and Table 2 that the RP-AIPmodel can well adapt to the common audio processing thatis to say the general audio processing method will not lead tothe destruction and loss of IO It can be seen from Figure 9that the RP-AIP model has similar robustness against thecommon audio processing compared with the 0-coefficientembedding model In addition as far as the smoothinglow-pass and band-pass filtering methods are concerned theperformance of band-pass filtering of RP-AIP is superior toother models significantly thanks to the method of randomembedding in the mid-band coefficient In general RP-AIPexhibits sufficient robustness for general audio processingmethods

532 Malicious Attack In this experiment we simulatedthree kinds of malicious attacks as cutting tampering andresampling Moreover in the type of tampering it canbe divided into self-media tampering and non-self-mediatampering self-media tampering refers to using the mediaitself for shifting swapping and so forth while non-self-media tampering refers to using other media for replacementcoverage and so forth Resampling can also be dividedinto upsampling and downsampling upsampling refers to

Security and Communication Networks 9

Table 2 The similarity 120588 between the original and recovered IO in audio processing experiments

No Processing methodsSimilarity120588

Random embedding (RP-AIP) 0-coefficient embedding n-coefficient embedding(a) Original 100 100 100(b) Without processing 097 099 095(c) Smoothinglow-pass filtering 088 098 018(d) Band-pass filtering 090 085 084(e) MPEG-1 (with 1051 ratio) 079 088 068(f) MPEG-1 (with 121 ratio) 081 085 064(g) AD conversion 075 074 076

Processing methods

0

05

1

15

2

25

3

RP-AIP0-coefficientn-coefficient

095

099

097

018

098

088 090

085

084068

088

079 081

085

064 076

074

075

W LP BP MP(i) MP(ii) AD

Figure 9 The comparison of the simulation results of differentaudio processing and embedding methods W without processingLP smoothinglow-pass filtering BP band-pass filtering MP (i)MPEG-1 compression (with compression ratio of 1051) MP (ii)MPEG-1 compression (with compression ratio of 121) AD ADconversion

interpolation expansion of the original audio while down-sampling refers to the extraction of the original audio Cuttingand tampering are undoubtedly malicious attacks on audioand attempt to change the speakerrsquos intention Howeverresampling attacks often only affect the length or quality ofthe audio especially the upsampling but cannot change thespeakerrsquos intention

Figure 10 shows the simulation results Among them (f)is the recovered IO after a cutting attack (g) is the one aftera self-media tampering attack (h) is the one after a non-self-media tampering attack (i) is the one after an upsamplingattack and (j) is the one after a downsampling attack Thesimilarity between the original and recovered IO and thecomparison with the other two common audio embeddingmethods as mentioned above can be found in Table 3 andpresented by Figure 11

As can be seen from Figure 10 and Table 3 in additionto the upsampling attack the RP-AIP model can effectively

(f)(g)(f) (h)

(i) (j)

Figure 10 The simulation results of the recovered IO of differentmalicious attacks

Attack types

0

05

1

15

2

25

3

097

099

095

050

044

031

062

046

024

057

033019

092

098

079022

042

020

RP-AIP0-coefficientn-coefficient

W CT TP(i) TP(ii) RS(i) RS(ii)

Figure 11 The comparison of the simulation results of differentmalicious attacks and embedding methods W without processingCT cutting attack TP (i) self-media tampering attack TP (ii)non-self-media tampering attack RS (i) upsampling attack RS (ii)downsampling attack

10 Security and Communication Networks

Table 3 The similarity 120588 between the original and recovered IO in malicious attack experiments

No Attack typesSimilarity120588

Random embedding (RP-AIP) 0-coefficient embedding n-coefficient embedding(a) Original 100 100 100(b) Without attacks 097 099 095(h) Cutting 031 044 050(i) Tampering (self-media) 024 046 062(j) Tampering (non-self-media) 019 033 057(k) Resampling (upsampling) 097 098 092(l) Resampling (downsampling) 020 042 022

detect and reflect malicious attacks As stated earlier the IOis the tangible representation form of the audio integritythat is changes in IO directly reflect changes in audiomedia Therefore we can evaluate the type and degree ofthe attack on the audio by observing IO In the case of (f)since cutting will directly lead to the loss of IO when theIO is reconstructed by 100times100 pixels it will become anincomplete image The missing part of IO indicates that thecorresponding audio part has been cut off In the same waythe tampered part appears as random noise or disorderedas shown in (g) and (h) because the audio has suffered thecorresponding tampering attack However for resamplingattacks the RP-AIP shows different results as shown in (i)and (j) This is because the upsampling tends to interpolatethe original audio signal without causing loss of IO whiledownsampling is the extraction of the original audio signalwhich directly leads to the loss and destruction of IO Asstated in the Introduction we are more concerned aboutwhether the attack has tampered with the speakerrsquos intentionbut resampling is generally not

It can be seen from Figure 11 that the RP-AIP modelhas definite better performance than the other two modelsThis is because attacks such as cutting and tampering aregenerally oriented to the audio signalrsquos time domain ratherthan the frequency domain so the damage to the low andhigh frequency parts is random The embedding methodwith only consideration of low or high frequency can onlyguarantee one aspect while the random embedding methodin RP-AIP model can synthesize these two frequency partsThis is also the innovation and highlight of our work

6 Conclusion

Audio integrity protection for network interaction envi-ronment is not only an effective way to protect audioinformation but also an important link in building trustedcommunication Due to real-time constraints of interactionscenarios this kind of audio protection needs to reconsidersome new characteristics such as streaming characteristicimperceptibility self-detectability and distortion tolerancewhich are not covered by the traditional programsThis posesnew challenges to the existing audio protection solutions

To this end a method of audio integrity protection basedon relative phase (RP-AIP) is proposed in this work In

the RP-AIP model we established the concept of integrityobject (IO) in order to propose and abstract the integrityof the audio signal and then transform it into a tangiblerepresentation form In addition we also fully consideredthe characteristics of the audio signal in the DFT and DCTtransform domain and used the frequency characteristicsof the audio itself to guide the random embedding of IOthereby achieving blind detection and extraction of it Thetangible expression of the audio integrity and the relativephase based random embedding scheme are the highlight ofour work

The simulation experiments show that the RP-AIPmodelcan guarantee the SNRof the audio signal and the embeddingof IO will not cause signal distortion In addition it hasgood adaptability to some common audio processing suchas smoothinglow-pass filtering band-pass filtering MPEG-1 compression with different ratios and AD conversionHowever aiming at malicious attacks such as cutting tam-pering and resampling the RP-AIP model shows obvioussuperiority to other similar protection solutions Of coursethe research about audio integrity protection is still in itsinfancy and more work will be done to make it morecomplete

Data Availability

Some of the data is uploaded to the public server httpsdoiorg106084m9figshare6382709v1 In the file you canfind the demo audio and its spectrum data the IO data andthe audio data used in the experiment

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work is supported by the National Natural ScienceFoundation of China [No 61671141 No 61401081] the Liaon-ing Provincial Natural Science Foundation of China [No20180551007] and the Ministry of Education-China MobileScientific Research Funds [No MCM20150103]

Security and Communication Networks 11

References

[1] R F Olanrewaju and O Khalifa ldquoDigital audio watermark-ing techniques and applicationsrdquo in Proceedings of the 2012International Conference on Computer and CommunicationEngineering ICCCE 2012 pp 830ndash835 Malaysia July 2012

[2] J Seok J Hong and J Kim ldquoA novel audio watermarkingalgorithm for copyright protection of digital audiordquo ETRIJournal vol 24 no 3 pp 181ndash189 2002

[3] H Yassine B Bachir and K Aziz ldquoA Secure and HighRobust AudioWatermarking System for Copyright ProtectionrdquoInternational Journal of Computer Applications vol 53 no 17pp 33ndash39 2012

[4] J Haitsma M van der Veen T Kalker and F BruekersldquoAudio watermarking for monitoring and copy protectionrdquo inProceedings of the ACM Workshops on Multimedia ACM pp119ndash122 2000

[5] R D Shelke and M U Nemade ldquoAudio watermarking tech-niques for copyright protection A reviewrdquo in Proceedings ofthe 2016 International Conference on Global Trends in SignalProcessing Information Computing and Communication ICGT-SPICC 2016 pp 634ndash640 December 2016

[6] S V Dhavale R S Deodhar and L M Patnaik ldquoHighCapacity Lossless Semi-fragile Audio Watermarking in theTimeDomainrdquo inAhierarchical CPNmodel formobility analysisin zone based MANET pp 843ndash852 2012

[7] S V Dhavale R S Deodhar D Pradhan and L M PatnaikldquoState Transition Based Embedding in Cepstrum Domain forAudio Copyright Protectionrdquo IETE Journal of Research vol 61no 1 pp 41ndash55 2015

[8] HH Tsai J S Cheng and P T Yu ldquoAudioWatermarkingBasedonHAS andNeuralNetworks inDCTDomainrdquoEurasip Journalon Advances in Signal Processing vol 3 no 2 pp 1ndash12 2003

[9] Y Xiang I Natgunanathan Y Rong and S Guo ldquoSpreadspectrum-based high embedding capacity watermarkingmethod for audio signalsrdquo IEEE Transactions on Audio Speechand Language Processing vol 23 no 12 pp 2228ndash2237 2015

[10] X Huang A Nishimura and I Echizen ldquoA Reversible AcousticSteganography for Integrity Verificationrdquo inDigital Watermark-ing p 305 Springer Berlin Heidelberg Germany 2011

[11] H Cao Y Wang J Li et al ldquoAudio data hiding using perceptualmasking effect of HASrdquo in Proceedings of the InternationalSymposium on Multispectral Image Processing and PatternRecognition International Society for Optics and Photonics 2007

[12] T Painter and A Spanias ldquoPerceptual coding of digital audiordquoProceedings of the IEEE vol 88 no 4 pp 451ndash512 2000

[13] S V Dhavale R S Deodhar D Pradhan et al ldquoRobustMultiple Stereo Audio Watermarking for Copyright Protectionand Integrity Checkingrdquo in Proceedings of the InternationalConference on Computational Intelligence and Information Tech-nology IET pp 9ndash16 2014

[14] L Boney A H Tewfik and K N Hamdy ldquoDigital watermarksfor audio signalsrdquo in Proceedings of the IEEE InternationalConference onMultimedia Computing and Systems pp 473ndash480IEEE 2002

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 8: A Relative Phase Based Audio Integrity Protection Method: Model …downloads.hindawi.com/journals/scn/2018/7139391.pdf · 2019-07-30 · ResearchArticle A Relative Phase Based Audio

8 Security and Communication Networks

Table 1 The SNR of different audio types after embedding IO

Audio clipsA1 A2 A3

Audio types A segment of voice message A prompt tone of Windows 10 A piece of musicSNR 3096 4233 4640

norm

aliz

ed am

plitu

de

0 02 04 06 08 10

0 02 04 06 08 10

time (s)

0 02 04 06 08 10

1

1

1

05

05

05

minus05

minus05

minus05

minus1

minus1

minus1

0

0

0

A3)(

A2)(

A1)(

Figure 7 The host audio clips in the experiment The audiosampling frequency is 22050 kHz and the length is 10 seconds (A1)a segment of voice message (A2) a prompt tone of Windows 10(A3) a piece of music

anomalies even filtering compression and AD conversionTherefore the first thing to consider is whether the model caneffectively distinguish between such accidents and maliciousattacks such as cutting and tampering As is stated above theloss and damage of IO caused by the transmission anomaliesare generally accidental and random but deterministic anddirectional caused by malicious attacks Thus this issue canbe judged from the recovered IO

531 Audio Processing In this experiment we simulated fivecommon audio processing methods smoothinglow-pass fil-tering (with 4kHz cutoff frequency) band-pass filtering (with200-2kHz cutoff frequency) MPEG-1 (with compressionratio of 1051 and 121) compression and AD conversionThe evaluation of audio signal distortion is indicated by thesimilarity between the original and recovered IO as well

Figure 8 shows the simulation results Among them (a)is the recovered IO after a smoothinglow-pass filtering (b)is the one after a band-pass filtering (c) is the one after aMPEG-1 (with compression ratio of 1051) compression (d)

(a) (b) (c)

(d) (e)

Figure 8 The simulation results of the recovered IO of differentaudio processing methods

is the one after a MPEG-1 (with compression ratio of 121)compression and (e) is the one after an AD conversion Thesimilarity between the original and recovered IO and thecomparison with the other two common audio embeddingmethods (embedding the IO on 0times0 or 119899 times 119899 coefficientwhich is called 0-coefficient or n-coefficient embedding forshort) can be found in Table 2 and presented by Figure 9

It can be seen from Figure 8 and Table 2 that the RP-AIPmodel can well adapt to the common audio processing thatis to say the general audio processing method will not lead tothe destruction and loss of IO It can be seen from Figure 9that the RP-AIP model has similar robustness against thecommon audio processing compared with the 0-coefficientembedding model In addition as far as the smoothinglow-pass and band-pass filtering methods are concerned theperformance of band-pass filtering of RP-AIP is superior toother models significantly thanks to the method of randomembedding in the mid-band coefficient In general RP-AIPexhibits sufficient robustness for general audio processingmethods

532 Malicious Attack In this experiment we simulatedthree kinds of malicious attacks as cutting tampering andresampling Moreover in the type of tampering it canbe divided into self-media tampering and non-self-mediatampering self-media tampering refers to using the mediaitself for shifting swapping and so forth while non-self-media tampering refers to using other media for replacementcoverage and so forth Resampling can also be dividedinto upsampling and downsampling upsampling refers to

Security and Communication Networks 9

Table 2 The similarity 120588 between the original and recovered IO in audio processing experiments

No Processing methodsSimilarity120588

Random embedding (RP-AIP) 0-coefficient embedding n-coefficient embedding(a) Original 100 100 100(b) Without processing 097 099 095(c) Smoothinglow-pass filtering 088 098 018(d) Band-pass filtering 090 085 084(e) MPEG-1 (with 1051 ratio) 079 088 068(f) MPEG-1 (with 121 ratio) 081 085 064(g) AD conversion 075 074 076

Processing methods

0

05

1

15

2

25

3

RP-AIP0-coefficientn-coefficient

095

099

097

018

098

088 090

085

084068

088

079 081

085

064 076

074

075

W LP BP MP(i) MP(ii) AD

Figure 9 The comparison of the simulation results of differentaudio processing and embedding methods W without processingLP smoothinglow-pass filtering BP band-pass filtering MP (i)MPEG-1 compression (with compression ratio of 1051) MP (ii)MPEG-1 compression (with compression ratio of 121) AD ADconversion

interpolation expansion of the original audio while down-sampling refers to the extraction of the original audio Cuttingand tampering are undoubtedly malicious attacks on audioand attempt to change the speakerrsquos intention Howeverresampling attacks often only affect the length or quality ofthe audio especially the upsampling but cannot change thespeakerrsquos intention

Figure 10 shows the simulation results Among them (f)is the recovered IO after a cutting attack (g) is the one aftera self-media tampering attack (h) is the one after a non-self-media tampering attack (i) is the one after an upsamplingattack and (j) is the one after a downsampling attack Thesimilarity between the original and recovered IO and thecomparison with the other two common audio embeddingmethods as mentioned above can be found in Table 3 andpresented by Figure 11

As can be seen from Figure 10 and Table 3 in additionto the upsampling attack the RP-AIP model can effectively

(f)(g)(f) (h)

(i) (j)

Figure 10 The simulation results of the recovered IO of differentmalicious attacks

Attack types

0

05

1

15

2

25

3

097

099

095

050

044

031

062

046

024

057

033019

092

098

079022

042

020

RP-AIP0-coefficientn-coefficient

W CT TP(i) TP(ii) RS(i) RS(ii)

Figure 11 The comparison of the simulation results of differentmalicious attacks and embedding methods W without processingCT cutting attack TP (i) self-media tampering attack TP (ii)non-self-media tampering attack RS (i) upsampling attack RS (ii)downsampling attack

10 Security and Communication Networks

Table 3 The similarity 120588 between the original and recovered IO in malicious attack experiments

No Attack typesSimilarity120588

Random embedding (RP-AIP) 0-coefficient embedding n-coefficient embedding(a) Original 100 100 100(b) Without attacks 097 099 095(h) Cutting 031 044 050(i) Tampering (self-media) 024 046 062(j) Tampering (non-self-media) 019 033 057(k) Resampling (upsampling) 097 098 092(l) Resampling (downsampling) 020 042 022

detect and reflect malicious attacks As stated earlier the IOis the tangible representation form of the audio integritythat is changes in IO directly reflect changes in audiomedia Therefore we can evaluate the type and degree ofthe attack on the audio by observing IO In the case of (f)since cutting will directly lead to the loss of IO when theIO is reconstructed by 100times100 pixels it will become anincomplete image The missing part of IO indicates that thecorresponding audio part has been cut off In the same waythe tampered part appears as random noise or disorderedas shown in (g) and (h) because the audio has suffered thecorresponding tampering attack However for resamplingattacks the RP-AIP shows different results as shown in (i)and (j) This is because the upsampling tends to interpolatethe original audio signal without causing loss of IO whiledownsampling is the extraction of the original audio signalwhich directly leads to the loss and destruction of IO Asstated in the Introduction we are more concerned aboutwhether the attack has tampered with the speakerrsquos intentionbut resampling is generally not

It can be seen from Figure 11 that the RP-AIP modelhas definite better performance than the other two modelsThis is because attacks such as cutting and tampering aregenerally oriented to the audio signalrsquos time domain ratherthan the frequency domain so the damage to the low andhigh frequency parts is random The embedding methodwith only consideration of low or high frequency can onlyguarantee one aspect while the random embedding methodin RP-AIP model can synthesize these two frequency partsThis is also the innovation and highlight of our work

6 Conclusion

Audio integrity protection for network interaction envi-ronment is not only an effective way to protect audioinformation but also an important link in building trustedcommunication Due to real-time constraints of interactionscenarios this kind of audio protection needs to reconsidersome new characteristics such as streaming characteristicimperceptibility self-detectability and distortion tolerancewhich are not covered by the traditional programsThis posesnew challenges to the existing audio protection solutions

To this end a method of audio integrity protection basedon relative phase (RP-AIP) is proposed in this work In

the RP-AIP model we established the concept of integrityobject (IO) in order to propose and abstract the integrityof the audio signal and then transform it into a tangiblerepresentation form In addition we also fully consideredthe characteristics of the audio signal in the DFT and DCTtransform domain and used the frequency characteristicsof the audio itself to guide the random embedding of IOthereby achieving blind detection and extraction of it Thetangible expression of the audio integrity and the relativephase based random embedding scheme are the highlight ofour work

The simulation experiments show that the RP-AIPmodelcan guarantee the SNRof the audio signal and the embeddingof IO will not cause signal distortion In addition it hasgood adaptability to some common audio processing suchas smoothinglow-pass filtering band-pass filtering MPEG-1 compression with different ratios and AD conversionHowever aiming at malicious attacks such as cutting tam-pering and resampling the RP-AIP model shows obvioussuperiority to other similar protection solutions Of coursethe research about audio integrity protection is still in itsinfancy and more work will be done to make it morecomplete

Data Availability

Some of the data is uploaded to the public server httpsdoiorg106084m9figshare6382709v1 In the file you canfind the demo audio and its spectrum data the IO data andthe audio data used in the experiment

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work is supported by the National Natural ScienceFoundation of China [No 61671141 No 61401081] the Liaon-ing Provincial Natural Science Foundation of China [No20180551007] and the Ministry of Education-China MobileScientific Research Funds [No MCM20150103]

Security and Communication Networks 11

References

[1] R F Olanrewaju and O Khalifa ldquoDigital audio watermark-ing techniques and applicationsrdquo in Proceedings of the 2012International Conference on Computer and CommunicationEngineering ICCCE 2012 pp 830ndash835 Malaysia July 2012

[2] J Seok J Hong and J Kim ldquoA novel audio watermarkingalgorithm for copyright protection of digital audiordquo ETRIJournal vol 24 no 3 pp 181ndash189 2002

[3] H Yassine B Bachir and K Aziz ldquoA Secure and HighRobust AudioWatermarking System for Copyright ProtectionrdquoInternational Journal of Computer Applications vol 53 no 17pp 33ndash39 2012

[4] J Haitsma M van der Veen T Kalker and F BruekersldquoAudio watermarking for monitoring and copy protectionrdquo inProceedings of the ACM Workshops on Multimedia ACM pp119ndash122 2000

[5] R D Shelke and M U Nemade ldquoAudio watermarking tech-niques for copyright protection A reviewrdquo in Proceedings ofthe 2016 International Conference on Global Trends in SignalProcessing Information Computing and Communication ICGT-SPICC 2016 pp 634ndash640 December 2016

[6] S V Dhavale R S Deodhar and L M Patnaik ldquoHighCapacity Lossless Semi-fragile Audio Watermarking in theTimeDomainrdquo inAhierarchical CPNmodel formobility analysisin zone based MANET pp 843ndash852 2012

[7] S V Dhavale R S Deodhar D Pradhan and L M PatnaikldquoState Transition Based Embedding in Cepstrum Domain forAudio Copyright Protectionrdquo IETE Journal of Research vol 61no 1 pp 41ndash55 2015

[8] HH Tsai J S Cheng and P T Yu ldquoAudioWatermarkingBasedonHAS andNeuralNetworks inDCTDomainrdquoEurasip Journalon Advances in Signal Processing vol 3 no 2 pp 1ndash12 2003

[9] Y Xiang I Natgunanathan Y Rong and S Guo ldquoSpreadspectrum-based high embedding capacity watermarkingmethod for audio signalsrdquo IEEE Transactions on Audio Speechand Language Processing vol 23 no 12 pp 2228ndash2237 2015

[10] X Huang A Nishimura and I Echizen ldquoA Reversible AcousticSteganography for Integrity Verificationrdquo inDigital Watermark-ing p 305 Springer Berlin Heidelberg Germany 2011

[11] H Cao Y Wang J Li et al ldquoAudio data hiding using perceptualmasking effect of HASrdquo in Proceedings of the InternationalSymposium on Multispectral Image Processing and PatternRecognition International Society for Optics and Photonics 2007

[12] T Painter and A Spanias ldquoPerceptual coding of digital audiordquoProceedings of the IEEE vol 88 no 4 pp 451ndash512 2000

[13] S V Dhavale R S Deodhar D Pradhan et al ldquoRobustMultiple Stereo Audio Watermarking for Copyright Protectionand Integrity Checkingrdquo in Proceedings of the InternationalConference on Computational Intelligence and Information Tech-nology IET pp 9ndash16 2014

[14] L Boney A H Tewfik and K N Hamdy ldquoDigital watermarksfor audio signalsrdquo in Proceedings of the IEEE InternationalConference onMultimedia Computing and Systems pp 473ndash480IEEE 2002

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 9: A Relative Phase Based Audio Integrity Protection Method: Model …downloads.hindawi.com/journals/scn/2018/7139391.pdf · 2019-07-30 · ResearchArticle A Relative Phase Based Audio

Security and Communication Networks 9

Table 2 The similarity 120588 between the original and recovered IO in audio processing experiments

No Processing methodsSimilarity120588

Random embedding (RP-AIP) 0-coefficient embedding n-coefficient embedding(a) Original 100 100 100(b) Without processing 097 099 095(c) Smoothinglow-pass filtering 088 098 018(d) Band-pass filtering 090 085 084(e) MPEG-1 (with 1051 ratio) 079 088 068(f) MPEG-1 (with 121 ratio) 081 085 064(g) AD conversion 075 074 076

Processing methods

0

05

1

15

2

25

3

RP-AIP0-coefficientn-coefficient

095

099

097

018

098

088 090

085

084068

088

079 081

085

064 076

074

075

W LP BP MP(i) MP(ii) AD

Figure 9 The comparison of the simulation results of differentaudio processing and embedding methods W without processingLP smoothinglow-pass filtering BP band-pass filtering MP (i)MPEG-1 compression (with compression ratio of 1051) MP (ii)MPEG-1 compression (with compression ratio of 121) AD ADconversion

interpolation expansion of the original audio while down-sampling refers to the extraction of the original audio Cuttingand tampering are undoubtedly malicious attacks on audioand attempt to change the speakerrsquos intention Howeverresampling attacks often only affect the length or quality ofthe audio especially the upsampling but cannot change thespeakerrsquos intention

Figure 10 shows the simulation results Among them (f)is the recovered IO after a cutting attack (g) is the one aftera self-media tampering attack (h) is the one after a non-self-media tampering attack (i) is the one after an upsamplingattack and (j) is the one after a downsampling attack Thesimilarity between the original and recovered IO and thecomparison with the other two common audio embeddingmethods as mentioned above can be found in Table 3 andpresented by Figure 11

As can be seen from Figure 10 and Table 3 in additionto the upsampling attack the RP-AIP model can effectively

(f)(g)(f) (h)

(i) (j)

Figure 10 The simulation results of the recovered IO of differentmalicious attacks

Attack types

0

05

1

15

2

25

3

097

099

095

050

044

031

062

046

024

057

033019

092

098

079022

042

020

RP-AIP0-coefficientn-coefficient

W CT TP(i) TP(ii) RS(i) RS(ii)

Figure 11 The comparison of the simulation results of differentmalicious attacks and embedding methods W without processingCT cutting attack TP (i) self-media tampering attack TP (ii)non-self-media tampering attack RS (i) upsampling attack RS (ii)downsampling attack

10 Security and Communication Networks

Table 3 The similarity 120588 between the original and recovered IO in malicious attack experiments

No Attack typesSimilarity120588

Random embedding (RP-AIP) 0-coefficient embedding n-coefficient embedding(a) Original 100 100 100(b) Without attacks 097 099 095(h) Cutting 031 044 050(i) Tampering (self-media) 024 046 062(j) Tampering (non-self-media) 019 033 057(k) Resampling (upsampling) 097 098 092(l) Resampling (downsampling) 020 042 022

detect and reflect malicious attacks As stated earlier the IOis the tangible representation form of the audio integritythat is changes in IO directly reflect changes in audiomedia Therefore we can evaluate the type and degree ofthe attack on the audio by observing IO In the case of (f)since cutting will directly lead to the loss of IO when theIO is reconstructed by 100times100 pixels it will become anincomplete image The missing part of IO indicates that thecorresponding audio part has been cut off In the same waythe tampered part appears as random noise or disorderedas shown in (g) and (h) because the audio has suffered thecorresponding tampering attack However for resamplingattacks the RP-AIP shows different results as shown in (i)and (j) This is because the upsampling tends to interpolatethe original audio signal without causing loss of IO whiledownsampling is the extraction of the original audio signalwhich directly leads to the loss and destruction of IO Asstated in the Introduction we are more concerned aboutwhether the attack has tampered with the speakerrsquos intentionbut resampling is generally not

It can be seen from Figure 11 that the RP-AIP modelhas definite better performance than the other two modelsThis is because attacks such as cutting and tampering aregenerally oriented to the audio signalrsquos time domain ratherthan the frequency domain so the damage to the low andhigh frequency parts is random The embedding methodwith only consideration of low or high frequency can onlyguarantee one aspect while the random embedding methodin RP-AIP model can synthesize these two frequency partsThis is also the innovation and highlight of our work

6 Conclusion

Audio integrity protection for network interaction envi-ronment is not only an effective way to protect audioinformation but also an important link in building trustedcommunication Due to real-time constraints of interactionscenarios this kind of audio protection needs to reconsidersome new characteristics such as streaming characteristicimperceptibility self-detectability and distortion tolerancewhich are not covered by the traditional programsThis posesnew challenges to the existing audio protection solutions

To this end a method of audio integrity protection basedon relative phase (RP-AIP) is proposed in this work In

the RP-AIP model we established the concept of integrityobject (IO) in order to propose and abstract the integrityof the audio signal and then transform it into a tangiblerepresentation form In addition we also fully consideredthe characteristics of the audio signal in the DFT and DCTtransform domain and used the frequency characteristicsof the audio itself to guide the random embedding of IOthereby achieving blind detection and extraction of it Thetangible expression of the audio integrity and the relativephase based random embedding scheme are the highlight ofour work

The simulation experiments show that the RP-AIPmodelcan guarantee the SNRof the audio signal and the embeddingof IO will not cause signal distortion In addition it hasgood adaptability to some common audio processing suchas smoothinglow-pass filtering band-pass filtering MPEG-1 compression with different ratios and AD conversionHowever aiming at malicious attacks such as cutting tam-pering and resampling the RP-AIP model shows obvioussuperiority to other similar protection solutions Of coursethe research about audio integrity protection is still in itsinfancy and more work will be done to make it morecomplete

Data Availability

Some of the data is uploaded to the public server httpsdoiorg106084m9figshare6382709v1 In the file you canfind the demo audio and its spectrum data the IO data andthe audio data used in the experiment

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work is supported by the National Natural ScienceFoundation of China [No 61671141 No 61401081] the Liaon-ing Provincial Natural Science Foundation of China [No20180551007] and the Ministry of Education-China MobileScientific Research Funds [No MCM20150103]

Security and Communication Networks 11

References

[1] R F Olanrewaju and O Khalifa ldquoDigital audio watermark-ing techniques and applicationsrdquo in Proceedings of the 2012International Conference on Computer and CommunicationEngineering ICCCE 2012 pp 830ndash835 Malaysia July 2012

[2] J Seok J Hong and J Kim ldquoA novel audio watermarkingalgorithm for copyright protection of digital audiordquo ETRIJournal vol 24 no 3 pp 181ndash189 2002

[3] H Yassine B Bachir and K Aziz ldquoA Secure and HighRobust AudioWatermarking System for Copyright ProtectionrdquoInternational Journal of Computer Applications vol 53 no 17pp 33ndash39 2012

[4] J Haitsma M van der Veen T Kalker and F BruekersldquoAudio watermarking for monitoring and copy protectionrdquo inProceedings of the ACM Workshops on Multimedia ACM pp119ndash122 2000

[5] R D Shelke and M U Nemade ldquoAudio watermarking tech-niques for copyright protection A reviewrdquo in Proceedings ofthe 2016 International Conference on Global Trends in SignalProcessing Information Computing and Communication ICGT-SPICC 2016 pp 634ndash640 December 2016

[6] S V Dhavale R S Deodhar and L M Patnaik ldquoHighCapacity Lossless Semi-fragile Audio Watermarking in theTimeDomainrdquo inAhierarchical CPNmodel formobility analysisin zone based MANET pp 843ndash852 2012

[7] S V Dhavale R S Deodhar D Pradhan and L M PatnaikldquoState Transition Based Embedding in Cepstrum Domain forAudio Copyright Protectionrdquo IETE Journal of Research vol 61no 1 pp 41ndash55 2015

[8] HH Tsai J S Cheng and P T Yu ldquoAudioWatermarkingBasedonHAS andNeuralNetworks inDCTDomainrdquoEurasip Journalon Advances in Signal Processing vol 3 no 2 pp 1ndash12 2003

[9] Y Xiang I Natgunanathan Y Rong and S Guo ldquoSpreadspectrum-based high embedding capacity watermarkingmethod for audio signalsrdquo IEEE Transactions on Audio Speechand Language Processing vol 23 no 12 pp 2228ndash2237 2015

[10] X Huang A Nishimura and I Echizen ldquoA Reversible AcousticSteganography for Integrity Verificationrdquo inDigital Watermark-ing p 305 Springer Berlin Heidelberg Germany 2011

[11] H Cao Y Wang J Li et al ldquoAudio data hiding using perceptualmasking effect of HASrdquo in Proceedings of the InternationalSymposium on Multispectral Image Processing and PatternRecognition International Society for Optics and Photonics 2007

[12] T Painter and A Spanias ldquoPerceptual coding of digital audiordquoProceedings of the IEEE vol 88 no 4 pp 451ndash512 2000

[13] S V Dhavale R S Deodhar D Pradhan et al ldquoRobustMultiple Stereo Audio Watermarking for Copyright Protectionand Integrity Checkingrdquo in Proceedings of the InternationalConference on Computational Intelligence and Information Tech-nology IET pp 9ndash16 2014

[14] L Boney A H Tewfik and K N Hamdy ldquoDigital watermarksfor audio signalsrdquo in Proceedings of the IEEE InternationalConference onMultimedia Computing and Systems pp 473ndash480IEEE 2002

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 10: A Relative Phase Based Audio Integrity Protection Method: Model …downloads.hindawi.com/journals/scn/2018/7139391.pdf · 2019-07-30 · ResearchArticle A Relative Phase Based Audio

10 Security and Communication Networks

Table 3 The similarity 120588 between the original and recovered IO in malicious attack experiments

No Attack typesSimilarity120588

Random embedding (RP-AIP) 0-coefficient embedding n-coefficient embedding(a) Original 100 100 100(b) Without attacks 097 099 095(h) Cutting 031 044 050(i) Tampering (self-media) 024 046 062(j) Tampering (non-self-media) 019 033 057(k) Resampling (upsampling) 097 098 092(l) Resampling (downsampling) 020 042 022

detect and reflect malicious attacks As stated earlier the IOis the tangible representation form of the audio integritythat is changes in IO directly reflect changes in audiomedia Therefore we can evaluate the type and degree ofthe attack on the audio by observing IO In the case of (f)since cutting will directly lead to the loss of IO when theIO is reconstructed by 100times100 pixels it will become anincomplete image The missing part of IO indicates that thecorresponding audio part has been cut off In the same waythe tampered part appears as random noise or disorderedas shown in (g) and (h) because the audio has suffered thecorresponding tampering attack However for resamplingattacks the RP-AIP shows different results as shown in (i)and (j) This is because the upsampling tends to interpolatethe original audio signal without causing loss of IO whiledownsampling is the extraction of the original audio signalwhich directly leads to the loss and destruction of IO Asstated in the Introduction we are more concerned aboutwhether the attack has tampered with the speakerrsquos intentionbut resampling is generally not

It can be seen from Figure 11 that the RP-AIP modelhas definite better performance than the other two modelsThis is because attacks such as cutting and tampering aregenerally oriented to the audio signalrsquos time domain ratherthan the frequency domain so the damage to the low andhigh frequency parts is random The embedding methodwith only consideration of low or high frequency can onlyguarantee one aspect while the random embedding methodin RP-AIP model can synthesize these two frequency partsThis is also the innovation and highlight of our work

6 Conclusion

Audio integrity protection for network interaction envi-ronment is not only an effective way to protect audioinformation but also an important link in building trustedcommunication Due to real-time constraints of interactionscenarios this kind of audio protection needs to reconsidersome new characteristics such as streaming characteristicimperceptibility self-detectability and distortion tolerancewhich are not covered by the traditional programsThis posesnew challenges to the existing audio protection solutions

To this end a method of audio integrity protection basedon relative phase (RP-AIP) is proposed in this work In

the RP-AIP model we established the concept of integrityobject (IO) in order to propose and abstract the integrityof the audio signal and then transform it into a tangiblerepresentation form In addition we also fully consideredthe characteristics of the audio signal in the DFT and DCTtransform domain and used the frequency characteristicsof the audio itself to guide the random embedding of IOthereby achieving blind detection and extraction of it Thetangible expression of the audio integrity and the relativephase based random embedding scheme are the highlight ofour work

The simulation experiments show that the RP-AIPmodelcan guarantee the SNRof the audio signal and the embeddingof IO will not cause signal distortion In addition it hasgood adaptability to some common audio processing suchas smoothinglow-pass filtering band-pass filtering MPEG-1 compression with different ratios and AD conversionHowever aiming at malicious attacks such as cutting tam-pering and resampling the RP-AIP model shows obvioussuperiority to other similar protection solutions Of coursethe research about audio integrity protection is still in itsinfancy and more work will be done to make it morecomplete

Data Availability

Some of the data is uploaded to the public server httpsdoiorg106084m9figshare6382709v1 In the file you canfind the demo audio and its spectrum data the IO data andthe audio data used in the experiment

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work is supported by the National Natural ScienceFoundation of China [No 61671141 No 61401081] the Liaon-ing Provincial Natural Science Foundation of China [No20180551007] and the Ministry of Education-China MobileScientific Research Funds [No MCM20150103]

Security and Communication Networks 11

References

[1] R F Olanrewaju and O Khalifa ldquoDigital audio watermark-ing techniques and applicationsrdquo in Proceedings of the 2012International Conference on Computer and CommunicationEngineering ICCCE 2012 pp 830ndash835 Malaysia July 2012

[2] J Seok J Hong and J Kim ldquoA novel audio watermarkingalgorithm for copyright protection of digital audiordquo ETRIJournal vol 24 no 3 pp 181ndash189 2002

[3] H Yassine B Bachir and K Aziz ldquoA Secure and HighRobust AudioWatermarking System for Copyright ProtectionrdquoInternational Journal of Computer Applications vol 53 no 17pp 33ndash39 2012

[4] J Haitsma M van der Veen T Kalker and F BruekersldquoAudio watermarking for monitoring and copy protectionrdquo inProceedings of the ACM Workshops on Multimedia ACM pp119ndash122 2000

[5] R D Shelke and M U Nemade ldquoAudio watermarking tech-niques for copyright protection A reviewrdquo in Proceedings ofthe 2016 International Conference on Global Trends in SignalProcessing Information Computing and Communication ICGT-SPICC 2016 pp 634ndash640 December 2016

[6] S V Dhavale R S Deodhar and L M Patnaik ldquoHighCapacity Lossless Semi-fragile Audio Watermarking in theTimeDomainrdquo inAhierarchical CPNmodel formobility analysisin zone based MANET pp 843ndash852 2012

[7] S V Dhavale R S Deodhar D Pradhan and L M PatnaikldquoState Transition Based Embedding in Cepstrum Domain forAudio Copyright Protectionrdquo IETE Journal of Research vol 61no 1 pp 41ndash55 2015

[8] HH Tsai J S Cheng and P T Yu ldquoAudioWatermarkingBasedonHAS andNeuralNetworks inDCTDomainrdquoEurasip Journalon Advances in Signal Processing vol 3 no 2 pp 1ndash12 2003

[9] Y Xiang I Natgunanathan Y Rong and S Guo ldquoSpreadspectrum-based high embedding capacity watermarkingmethod for audio signalsrdquo IEEE Transactions on Audio Speechand Language Processing vol 23 no 12 pp 2228ndash2237 2015

[10] X Huang A Nishimura and I Echizen ldquoA Reversible AcousticSteganography for Integrity Verificationrdquo inDigital Watermark-ing p 305 Springer Berlin Heidelberg Germany 2011

[11] H Cao Y Wang J Li et al ldquoAudio data hiding using perceptualmasking effect of HASrdquo in Proceedings of the InternationalSymposium on Multispectral Image Processing and PatternRecognition International Society for Optics and Photonics 2007

[12] T Painter and A Spanias ldquoPerceptual coding of digital audiordquoProceedings of the IEEE vol 88 no 4 pp 451ndash512 2000

[13] S V Dhavale R S Deodhar D Pradhan et al ldquoRobustMultiple Stereo Audio Watermarking for Copyright Protectionand Integrity Checkingrdquo in Proceedings of the InternationalConference on Computational Intelligence and Information Tech-nology IET pp 9ndash16 2014

[14] L Boney A H Tewfik and K N Hamdy ldquoDigital watermarksfor audio signalsrdquo in Proceedings of the IEEE InternationalConference onMultimedia Computing and Systems pp 473ndash480IEEE 2002

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 11: A Relative Phase Based Audio Integrity Protection Method: Model …downloads.hindawi.com/journals/scn/2018/7139391.pdf · 2019-07-30 · ResearchArticle A Relative Phase Based Audio

Security and Communication Networks 11

References

[1] R F Olanrewaju and O Khalifa ldquoDigital audio watermark-ing techniques and applicationsrdquo in Proceedings of the 2012International Conference on Computer and CommunicationEngineering ICCCE 2012 pp 830ndash835 Malaysia July 2012

[2] J Seok J Hong and J Kim ldquoA novel audio watermarkingalgorithm for copyright protection of digital audiordquo ETRIJournal vol 24 no 3 pp 181ndash189 2002

[3] H Yassine B Bachir and K Aziz ldquoA Secure and HighRobust AudioWatermarking System for Copyright ProtectionrdquoInternational Journal of Computer Applications vol 53 no 17pp 33ndash39 2012

[4] J Haitsma M van der Veen T Kalker and F BruekersldquoAudio watermarking for monitoring and copy protectionrdquo inProceedings of the ACM Workshops on Multimedia ACM pp119ndash122 2000

[5] R D Shelke and M U Nemade ldquoAudio watermarking tech-niques for copyright protection A reviewrdquo in Proceedings ofthe 2016 International Conference on Global Trends in SignalProcessing Information Computing and Communication ICGT-SPICC 2016 pp 634ndash640 December 2016

[6] S V Dhavale R S Deodhar and L M Patnaik ldquoHighCapacity Lossless Semi-fragile Audio Watermarking in theTimeDomainrdquo inAhierarchical CPNmodel formobility analysisin zone based MANET pp 843ndash852 2012

[7] S V Dhavale R S Deodhar D Pradhan and L M PatnaikldquoState Transition Based Embedding in Cepstrum Domain forAudio Copyright Protectionrdquo IETE Journal of Research vol 61no 1 pp 41ndash55 2015

[8] HH Tsai J S Cheng and P T Yu ldquoAudioWatermarkingBasedonHAS andNeuralNetworks inDCTDomainrdquoEurasip Journalon Advances in Signal Processing vol 3 no 2 pp 1ndash12 2003

[9] Y Xiang I Natgunanathan Y Rong and S Guo ldquoSpreadspectrum-based high embedding capacity watermarkingmethod for audio signalsrdquo IEEE Transactions on Audio Speechand Language Processing vol 23 no 12 pp 2228ndash2237 2015

[10] X Huang A Nishimura and I Echizen ldquoA Reversible AcousticSteganography for Integrity Verificationrdquo inDigital Watermark-ing p 305 Springer Berlin Heidelberg Germany 2011

[11] H Cao Y Wang J Li et al ldquoAudio data hiding using perceptualmasking effect of HASrdquo in Proceedings of the InternationalSymposium on Multispectral Image Processing and PatternRecognition International Society for Optics and Photonics 2007

[12] T Painter and A Spanias ldquoPerceptual coding of digital audiordquoProceedings of the IEEE vol 88 no 4 pp 451ndash512 2000

[13] S V Dhavale R S Deodhar D Pradhan et al ldquoRobustMultiple Stereo Audio Watermarking for Copyright Protectionand Integrity Checkingrdquo in Proceedings of the InternationalConference on Computational Intelligence and Information Tech-nology IET pp 9ndash16 2014

[14] L Boney A H Tewfik and K N Hamdy ldquoDigital watermarksfor audio signalsrdquo in Proceedings of the IEEE InternationalConference onMultimedia Computing and Systems pp 473ndash480IEEE 2002

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 12: A Relative Phase Based Audio Integrity Protection Method: Model …downloads.hindawi.com/journals/scn/2018/7139391.pdf · 2019-07-30 · ResearchArticle A Relative Phase Based Audio

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom