secret-to-image reversible transformation for generative

15
JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 1 Secret-to-Image Reversible Transformation for Generative Steganography Zhili Zhou, Member, IEEE, Yuecheng Su, Q. M. Jonathan Wu, Senior Member, IEEE, Zhangjie Fu, Yunqing Shi, Life Fellow, IEEE, Abstract—Recently, generative steganography that transforms secret information to a generated image has been a promising technique to resist steganalysis detection. However, due to the inefficiency and irreversibility of the secret-to-image transforma- tion, it is hard to find a good trade-off between the informa- tion hiding capacity and extraction accuracy. To address this issue, we propose a secret-to-image reversible transformation (S2IRT) scheme for generative steganography. The proposed S2IRT scheme is based on a generative model, i.e., Glow model, which enables a bijective-mapping between latent space with multivariate Gaussian distribution and image space with a complex distribution. In the process of S2I transformation, guided by a given secret message, we construct a latent vector and then map it to a generated image by the Glow model, so that the secret message is finally transformed to the generated image. Owing to good efficiency and reversibility of S2IRT scheme, the proposed steganographic approach achieves both high hiding capacity and accurate extraction of secret message from gen- erated image. Furthermore, a separate encoding-based S2IRT (SE-S2IRT) scheme is also proposed to improve the robustness to common image attacks. The experiments demonstrate the proposed steganographic approaches can achieve high hiding capacity (up to 4 bpp) and accurate information extraction (almost 100% accuracy rate) simultaneously, while maintaining desirable anti-detectability and imperceptibility. Index Terms—Steganography, coverless steganography, gener- ative steganography, information hiding, digital forensics. I. I NTRODUCTION I MAGE steganography is the technology of imperceptibly hiding secret information into a carrier image so that the covert communication can be achieved by transmitting the carrier image without arousing suspicion [1], [2]. As another way of covert communication, data encryption usually encodes the secret information in an incomprehensible and meaningless This work was supported in part by the National Natu-ral Science Foun- dation of China under Grant 61972205, U1836208, U1836110, in part by the National Key R&D Program of China under Grant 2018YFB1003205, in part by the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD) fund, in part by the Collaborative Innovation Center of Atmos-pheric Environment and Equipment Technology (CI-CAEET) fund, China. Z. L. Zhou, Y. C. Su, and Z. J. Fu are with School of Computer and Software & Engineering Research Center of Digital Forensics Ministry of Ed- ucation, Nanjing University of Information Science and Technology, Nanjing 210044, China. E-mail: zhou [email protected], su [email protected], [email protected]. Q. M. J. Wu is with the Department of Electrical and Computer Engi- neering, University of Windsor, Windsor, Ontario, Canada N9B 3P4. E-mail: [email protected]. Y. Q. Shi is with the Department of ECE, New Jersey Institute of Technology, Newark, NJ 07102 USA. E-mail: [email protected]. form, and thus it exposes the importance and confidentiality. That makes the secret information vulnerable to be intercepted and cracked. Therefore, compared to encryption, the advantage of steganography is that it can conceal the occurrence of covert communication to ensure the security of the transmitted secret information [1], [2]. The conventional steganographic approaches generally se- lect an existing image as a cover and then embed secret infor- mation into the cover with a slight modification. However, due to the modification, image distortion is inevitably left more or less in the cover image, especially under a high hiding payload. That might cause the presence of hidden information to be successfully exposed by well-designed steganalyzers [3]. To resist the detection of steganalyzers, recently, some researchers have devoted to the study of “generative steganography” [4]– [6]. Instead of modifying an existing cover image, the idea of generative steganography is to transform a given secret message to a generated image, which is then used as the stego- image for covert communication. Since the secret information is concealed without modification, generative steganography could achieve promising performance against the detection of steganalyzers. Recently, generative steganography usually applied Gener- ative Adversarial Networks (GANs) [7] to transform a given secret message to a new meaningful image that looks very real- istic, i.e., “realistic-looking image”. Some generative stegano- graphic approaches usually converted the secret message to a simple image label [8]–[10] or low-dimensional noise signal [6], [11]–[13] as the input of a GAN model to generate an image, and thus the secret message is finally transformed to the generated image. However, these approaches can only convert a secret message of small size to a low-dimensional input of GAN model, and only allow one-way mapping from the input information to the image content. Thus, the secret-to-image transformation process is generally inefficient and irreversible. Consequently, their hiding capacity is quite limited, or they cannot accurately extract the secret information from the generated image. That makes these approaches inapplicable in practice steganographic tasks. According to the above, to realize both high-capacity hiding and accurate extraction of secret information, the key issue of generative steganography is to find an efficient and reversible transformation between secret messages and images. To this end, instead of using GANs, we explore a flow-based gen- erative model, i.e., Glow model, which enables a bijective- mapping between high-dimensional latent vectors and images 0000–0000/00$00.00 © 2021 IEEE arXiv:2203.06598v1 [cs.CR] 13 Mar 2022

Upload: khangminh22

Post on 15-Mar-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 1

Secret-to-Image Reversible Transformation forGenerative Steganography

Zhili Zhou, Member, IEEE, Yuecheng Su, Q. M. Jonathan Wu, Senior Member, IEEE, Zhangjie Fu, YunqingShi, Life Fellow, IEEE,

Abstract—Recently, generative steganography that transformssecret information to a generated image has been a promisingtechnique to resist steganalysis detection. However, due to theinefficiency and irreversibility of the secret-to-image transforma-tion, it is hard to find a good trade-off between the informa-tion hiding capacity and extraction accuracy. To address thisissue, we propose a secret-to-image reversible transformation(S2IRT) scheme for generative steganography. The proposedS2IRT scheme is based on a generative model, i.e., Glow model,which enables a bijective-mapping between latent space withmultivariate Gaussian distribution and image space with acomplex distribution. In the process of S2I transformation, guidedby a given secret message, we construct a latent vector and thenmap it to a generated image by the Glow model, so that thesecret message is finally transformed to the generated image.Owing to good efficiency and reversibility of S2IRT scheme, theproposed steganographic approach achieves both high hidingcapacity and accurate extraction of secret message from gen-erated image. Furthermore, a separate encoding-based S2IRT(SE-S2IRT) scheme is also proposed to improve the robustnessto common image attacks. The experiments demonstrate theproposed steganographic approaches can achieve high hidingcapacity (up to 4 bpp) and accurate information extraction(almost 100% accuracy rate) simultaneously, while maintainingdesirable anti-detectability and imperceptibility.

Index Terms—Steganography, coverless steganography, gener-ative steganography, information hiding, digital forensics.

I. INTRODUCTION

IMAGE steganography is the technology of imperceptiblyhiding secret information into a carrier image so that the

covert communication can be achieved by transmitting thecarrier image without arousing suspicion [1], [2]. As anotherway of covert communication, data encryption usually encodesthe secret information in an incomprehensible and meaningless

This work was supported in part by the National Natu-ral Science Foun-dation of China under Grant 61972205, U1836208, U1836110, in part by theNational Key R&D Program of China under Grant 2018YFB1003205, in partby the Priority Academic Program Development of Jiangsu Higher EducationInstitutions (PAPD) fund, in part by the Collaborative Innovation Center ofAtmos-pheric Environment and Equipment Technology (CI-CAEET) fund,China.

Z. L. Zhou, Y. C. Su, and Z. J. Fu are with School of Computer andSoftware & Engineering Research Center of Digital Forensics Ministry of Ed-ucation, Nanjing University of Information Science and Technology, Nanjing210044, China. E-mail: zhou [email protected], su [email protected],[email protected].

Q. M. J. Wu is with the Department of Electrical and Computer Engi-neering, University of Windsor, Windsor, Ontario, Canada N9B 3P4. E-mail:[email protected].

Y. Q. Shi is with the Department of ECE, New Jersey Institute ofTechnology, Newark, NJ 07102 USA. E-mail: [email protected].

form, and thus it exposes the importance and confidentiality.That makes the secret information vulnerable to be interceptedand cracked. Therefore, compared to encryption, the advantageof steganography is that it can conceal the occurrence of covertcommunication to ensure the security of the transmitted secretinformation [1], [2].

The conventional steganographic approaches generally se-lect an existing image as a cover and then embed secret infor-mation into the cover with a slight modification. However, dueto the modification, image distortion is inevitably left more orless in the cover image, especially under a high hiding payload.That might cause the presence of hidden information to besuccessfully exposed by well-designed steganalyzers [3]. Toresist the detection of steganalyzers, recently, some researchershave devoted to the study of “generative steganography” [4]–[6]. Instead of modifying an existing cover image, the ideaof generative steganography is to transform a given secretmessage to a generated image, which is then used as the stego-image for covert communication. Since the secret informationis concealed without modification, generative steganographycould achieve promising performance against the detection ofsteganalyzers.

Recently, generative steganography usually applied Gener-ative Adversarial Networks (GANs) [7] to transform a givensecret message to a new meaningful image that looks very real-istic, i.e., “realistic-looking image”. Some generative stegano-graphic approaches usually converted the secret message to asimple image label [8]–[10] or low-dimensional noise signal[6], [11]–[13] as the input of a GAN model to generate animage, and thus the secret message is finally transformed to thegenerated image. However, these approaches can only converta secret message of small size to a low-dimensional input ofGAN model, and only allow one-way mapping from the inputinformation to the image content. Thus, the secret-to-imagetransformation process is generally inefficient and irreversible.Consequently, their hiding capacity is quite limited, or theycannot accurately extract the secret information from thegenerated image. That makes these approaches inapplicablein practice steganographic tasks.

According to the above, to realize both high-capacity hidingand accurate extraction of secret information, the key issue ofgenerative steganography is to find an efficient and reversibletransformation between secret messages and images. To thisend, instead of using GANs, we explore a flow-based gen-erative model, i.e., Glow model, which enables a bijective-mapping between high-dimensional latent vectors and images

0000–0000/00$00.00 © 2021 IEEE

arX

iv:2

203.

0659

8v1

[cs

.CR

] 1

3 M

ar 2

022

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 2

Fig. 1. The framework of S2IRT scheme for generative steganography basedon Glow model.

for generative steganography. By analyzing the properties ofthe bijective-mapping process of Glow model, we proposethe secret-to-image reversible transformation (S2IRT) schemebased on Glow model to realize an efficient and reversibletransformation between secret messages and generated imagesfor information hiding and extraction.

Specifically, we encode a given secret messages as a high-dimensional latent vector in an efficient and robust man-ner, and then map the vector to a generated image by theGlow model for information hiding. Consequently, the secretmessage is finally transformed to a generated image. Also,a reverse transformation between the secret message andthe generated image can be implemented for informationextraction. Fig. 1 lists the framework of the S2IRT schemefor generative steganography based on Glow model. The maincontributions are summarized as follows.

(1) The properties of the bijective-mapping process of Glowmodel are analyzed. By the experiments and analysis, it isfound that the bijective-mapping process of Glow model hasthe following properties: the elements of the latent vectormapped from a real image follow one-dimensional approxi-mately Gaussian distribution, and the mapping errors occur inthe bijective-mapping process. The property analysis motivatesus to propose the S2IRT scheme based on Glow model forgenerative steganography.

(2) The S2IRT scheme is proposed for generative steganog-raphy. In the proposed S2IRT scheme, an efficient and robustencoding method is first designed to encode a given secretmessage as a high-dimensional latent vector. Specifically,guided by a given secret mes-sage, a large number of elementsare arranged into the corresponding positions to construct ahigh-dimensional vector. Consequently, instead of simply en-coding a given secret message as elements, the secret messageis encoded as the position arrangement of these elements. Theencoding method not only ensures the elements’ distributionsof the constructed la-tent vector and the latent vector mappedfrom a real image are consistent, but also the encoded messageis robust to the bijective-mapping errors. Then, the vectoris mapped to a generated image by Glow model, and thegenerated image is used as the stego-image for steganography.The S2IRT scheme has the following advantages.

Due to the efficient and robust message encoding and thebijective mapping processes, the proposed S2IRT scheme hasgood efficiency and reversibility, which lead to both high

hiding capacity and accurate extraction of secret message.Moreover, the quality of generated image can maintain at a

high-level with the increase of hiding payload. That is because,although encoding secret message of different sizes lead todifferent constructed vectors, each constructed vector in latentspace can be mapped to a high-quality image in image spaceby Glow model.

In addition, the existing steganalyzers are still not likelyto defeat the proposed steganographic approach, since thesteganography is implemented by generating a new image asthe stego-image rather than by modifying an existing image.

(3) The SE-S2IRT scheme is proposed to improve therobustness of generative steganography. To improve the robust-ness to common attacks such as intensity changing, contrastenhancement, and noise addition, the SE-S2IRT scheme isproposed using a designed separate encoding method forgenerative steganography.

The remainder of this paper is organized as follows. Section2 introduces the related work. Section 3 briefly describesthe Glow model. Section 4 and 5 elaborate the proposedS2IRT and SE-S2IRT schemes for generative steganography,respectively. Section 6 presents and discusses the experimentalresults. Conclusions are drawn in Section 7.

II. RELATED WORK

A. Conventional Steganography

In the past two decades, a lot of conventional image stegano-graphic approaches have been proposed. They generally selectan existing image as the cover and then imperceptibly embedsecret information into the cover image with a slight modifi-cation.

In the early work [14], [15], steganography is implementedby simply modifying the least significant bit (LSB) of eachpixel of cover image with the same probability to embed secretinformation. To reduce image distortion caused by modifica-tion, Fridrich et al. [1] proposed the adaptive steganographybased on the framework of minimizing additive distortion.Specifically, a distortion function is heuristically defined toassign an embedding cost to each pixel of the cover image,and then the secret information is embedded in the wayof minimizing the sum of embedding costs, i.e., additivedistortion. By designing different distortion functions, manyadaptive steganographic approaches have been proposed, suchas HUGO [16], S-UNIWARD [17], HILL [18], and Mi-POD [19]. Also, some non-additive distortion-based adaptivesteganographic approaches such as DeJoin [20], ACMP [21],and GMRF [22] have been proposed. However, since all ofthese distortion functions are defined heuristically, it is stillpossible to design a better distortion function to implementsteganography with less image distortion.

As deep learning techniques have made remarkable achieve-ments in the field of computer vision, some researchersused deep learning networks to automatically learn a dis-tortion function for steganography, as done by ASDL-GAN[23], UT-6HPF-GAN [24], and reinforcement learning-basedsteganography [25]. Compared to the heuristically defineddistortion functions, the distortion functions learned by deep

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 3

learning networks can further reduce the image distortion forsteganography. Recently, Baluja et al. [26] trained two fullyconvolutional networks [27] to directly embed a secret imagewithin a cover image and to extract the secret image from thestego-image, respectively.

As an adversary of steganography, steganalysis aims todetect the existence of hidden information in stego-images.To this end, the handcrafted steganalytic features such asfirst-order statistical features [28] and second-order statistics[29], [30] are extracted from the stego-images, and thenare fed into the advanced ma-chine learning-based classifierssuch as Support Vector Machine (SVM). Fridrich et al. [31]proposed the Spatial Rich Model (SRM)-based steganalyzer,in which a set of high-dimensional handcrafted steganalyticfeatures are extracted for steganalysis. In recent work [32]–[34], convolutional neural networks (CNNs) were applied toautomatically learn steganalytic features and classify thesefeatures for steganalysis.

Although some conventional steganographic approaches[35], [36] could fool one or several certain steganalyzers,it is very hard for the approach to resist the detection ofa set of well-designed steganalyzers. The main reason isthat the modification operation of conventional steganographyinevitably leads to some distortion in the stego-images, whichwill cause the presence of hidden information to be detectedsuccessfully.

B. Generative Steganography

Instead of modifying an existing cover image, the idea ofgenerative steganography is to generate a new image on the ba-sis of secret information, which is then used as the stego-imagefor covert communication. Since the secret information is con-cealed without modification, generative steganography couldachieve promising performance against the detection of stegan-alyzers. The existing generative steganographic approach-escan be roughly categorized into two classes: texture synthesis-based and GANs-based approaches.

Wu et al. [4] proposed a generative steganographic approachbased on reversible texture synthesis. It hides a given secretmessage into a generated texture image through the process oftexture synthesis. Qian et al. [37] proposed an improved ver-sion of this steganographic approach to enhance the robustnessto JPEG compression. Xu et al. [38] directly transformed thesecret message to an intricate texture image for steganography.However, the texture images are generally meaningless andthey somewhat resemble the encrypted data. Thus, transmittingthese images on networks easily arouses attackers’ suspicion.

Fortunately, the popular generative deep learning models,i.e., Generative Adversarial Networks (GANs) [7], are able toproduce new meaningful images that look very realistic, i.e.,“realistic-looking images”. Thus, some generative stegano-graphic approaches transformed the secret messages to therealistic-looking images based on GANs.

Some researchers converted the secret message to a simpleimage label or semantic information as inter-media data, andthen fed it into GANs to generate the stego-image. Liu etal. [8] transformed a given secret message to the class label

information and then fed it into the Auxiliary Classifier GAN(ACGAN) [39] to generate the stego-image. Also, a discrimi-nator was trained to determine the class label of the generatedimage to extract the hidden secret message. Zhang et al. [9]mapped the secret message to the image semantic information,and then used this information as a input condition of GANsto generate the corresponding stego-image. Qin et al. [10]used Faster RCNN [40] to detect objects and established amapping dictionary between binary message sequences andobject labels, and then transformed the secret message to theimage with corresponding label by GANs. However, since thesimple label and semantic information can only contain verylimited information, the hiding capacity of these generativesteganographic approaches is quite low.

To enhance hiding capacity, some researchers converted thesecret message to a noise signal, and then used the noise signalas the input of GANs to generate the stego-image. Hu et al. [6],[11] directly encoded the secret message as a noise signal, andthen fed it into Deep Convolutional GANs (DCGAN) [41] togenerate a realistic-looking image as the stego-image. Insteadof using DCGAN, Li et al. [12] adopted Wasserstein GANGradient Penalty (WGAN-GP) [42] to transform the secretmessage to a generated image for steganography. Arifianto etal. [13] used a word2vec model [43] to convert the secretmessage to a word vector, which is then used as input vectorof GANs to generate the stego-image. However, since GANsonly allow one-way map-ping from the low-dimensional inputinformation to the high-dimensional images, those GANs-based steganographic approaches cannot accurately extractsecret information from these generated images, especiallyunder high hiding payloads.

In a summary, the existing generative steganographic ap-proaches cannot achieve a good trade-off between hidingcapacity and extraction accuracy of secret message. Thatmakes them inapplicable in practice steganographic tasks.

To achieve both high-capacity information hiding and ac-curate information extraction, in this paper, we at-tempt topropose the S2IRT scheme based on Glow model to real-ize an efficient and reversible transformation between secretmessages and generated images for information hiding andextraction. Consequently, the proposed approach can achievehigh-capacity information hiding and accurate informationextraction simultaneously, while maintaining desirable anti-detectability and imperceptibility for generative steganogra-phy.

III. GLOW MODEL

In this paper, the proposed S2IRT and SE-S2IRT schemesare proposed relying on the flow-based generative model, i.e.,Glow model for generative steganography.

To model complex distribution of high-dimensional imagespace, the flow-based generative models such as NICE [44],RealNVP [45], and Glow [46] generally learn a bijective-mapping between latent space with simple distribution andimage space with complex distribution. In these models, theGlow model has superior ability of generating high qualityrealistic-looking natural images and enables the bijective-mapping between the latent vec-tors and the images. Thus,

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 4

(a) (b) (c)Fig. 2. The illustration of the properties of the bijective mapping process in Glow model. (a) Distributions of elements of different vectors. Distribution 1#is the distribution of the vector’s elements of a real image, Distribution 2# the distribution of a vector constructed by element position arrangement-basedencoding, and Distribution 3# is a vector constructed by element value-based encoding. (b) The cumulative distribution function of the difference betweenthe elements of a constructed vector and the recovered one. (c) The difference between the first five elements of the constructed and recovered vectors.

in this paper, we explore the Glow model to implement theS2IRT and SE-S2IRT schemes for generative steganography.The Glow model is briefly described as follows.

Let z be the variable of latent space following the simpledistribution pZ(z), i.e., spherical multivariate Gaussian distri-bution, and x the variable of image space following a complexdistribution pX(x). They are represented by

z ∼ pZ(z) (1)

x ∼ pX(z) (2)

Where, the dimension of z is assumed to be the same as that ofx, and its components zd are assumed to be independent. ThuspZ(z) can be represented by pZ(z) =

∏d pZd

(Zd), wherepZd

(zd) is the one-dimensional Gaussian distribution.To achieve the bijective-mapping, the training process of

Glow model is to learn an invertible mapping function fθ.The function and its inverse function can be represented by

z = fθ(x) (3)

x = gθ(z) = f−1θ (z) (4)

According to Jacobian determinant, the relationship betweenthe distributions pX(x) and pZ(z) can be represented by

pX(x) = pZ(z) | det∂z

∂x|= pZ(fθ(x)) | det

∂fθ(x)

∂x| (5)

Suppose there is a training dataset Pdata containing a set ofimage samples x, and the values of their elements are in therange of [0, 1]. According to the above equation, the learningof fθ is realized via maximum log-likelihood estimation usingthe following equation:

maxθ∈Θ

Ex∼Pdata[log pX(x)]

=maxθ∈Θ

Ex∼Pdata[log pZ(fθ(x)) | det

∂fθ(x)

∂x|]

(6)

Consequently, by using the learned Glow model consistingof a pair of mapping functions fθ and gθ as shown in Eqs. (3)and (4), we can achieve the bijective-mapping between imagespace and latent space.

IV. S2IRT SCHEME FOR GENERATIVE STEGANOGRAPHY

Fig. 1 shows the framework of the proposed S2IRT schemefor generative steganography based on the Glow model. It con-sists of two stages: S2I transformation for information hidingand S2I reverse trans-formation for information extraction. Inthis section, we first give the motivation of the proposed S2IRTscheme, and then separately elaborate the two stages of S2IRTscheme.

A. Motivation

To realize the generative steganography, it is necessary toencode a given secret message as a latent vector at first, andthen input the vector into the Glow model to generate thestego-image. To this end, we analyze the properties of bijectivemapping of Glow model to pro-pose the S2IRT scheme forgenerative steganography. More details are given as follows.

Property 1. The elements of the latent vector mappedfrom a real image follow one-dimensional approximatelyGaussian distribution.

As mentioned in Section 3, the Glow model builds abijective-mapping between the high-dimensional latent vectorsin which each dimension follows one-dimensional Gaussiandistribution and the real images with a complex distribution.Moreover, the dimensions of the latent vectors are inde-pendent. Thus, for a given real image, the elements of itslatent vector follow one-dimensional approximately Gaussiandistribution, as shown by a toy example in Fig. 2(a). To avoidthe successful identification of generated images from realimages on latent space, a set of elements are randomly selectedfrom the standard Gaussian distribution to construct a latentvector for image generation, so that the elements of the latentvector of the generated image also follow the one-dimensionalapproximately Gaussian distribution, as done in the literature[44], [46].

To resist the detection of the secret message hidden in agenerated stego-image on latent space, in this paper, we alsorandomly sample a set of elements from the standard Gaussiandistribution to construct the latent vector for the stego-imagegeneration, so that the elements of the latent vector of eachgenerated stego-image also follow the approximately Gaussiandistribution. Then, these elements are divided into a number of

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 5

groups according to their values. Afterward, guided by a givensecret message, we arrange the positions for these elementgroups sequentially to construct the latent vector, and thus thesecret message is encoded as the position arrangement of theseelements.

On the contrary, if we directly encode the secret messageas the element values to construct the latent vector for stego-image generation, it is very likely that the distribution of theseelements is quite different from the approximately Gaussiandistribution, which will cause the generated stego-image tobe easily detected by analyzing the distribution of its latentvector’s elements. Fig. 2 (a) also illustrates the distributionsof the elements of the latent vectors constructed by elementposition arrangement-based encoding and element value-basedencoding methods.

Property 2. There are mapping errors in the bijective-mapping. Since the bijective-mapping is implemented be-tween continuous latent space and discrete image space, thereare mapping errors in the bijective-mapping process.

As a result, although the constructed latent vector can berecovered from the generated stego-image by the Glow modelfor information extraction, there is still a slight differencebetween the original vector and the recovered one, as shownby a toy example in Fig. 2(b). Thus, some elements of originalvector near to group boundaries will shift to the adjacentgroups. That will affect the accuracy of information extraction.As shown by a toy example of an original vector and therecovered one in Fig. 2(c), the second element shifts fromGroup 4 to Group 5, and thus it is hard to accurately obtain theposition arrangement of the original vector’ elements to extractthe hidden message. Thus, we choose and arrange the elementsnearest to the center of each group, which are less likely toshift to the adjacent groups, for latent vector construction.

Therefore, motivated by above properties of bijective map-ping of Glow model, we propose the S2IRT scheme forgenerative steganography. For information hiding, we groupa set of elements randomly sampled from standard Gaussiandistribution and choose the elements near to each group’scenter. Then, guided by a given secret message, we arrangethe positions of these elements to construct the latent vec-tor. Finally, the constructed latent vector is input into theGlow model to generate the stego-image. Consequently, thesecret message is transformed to the generated stego-image.For information extraction, the S2I reverse transformation isimplemented to extract the secret message from the generatedimage. More details are given as follows.

B. S2I Transformation for Information Hiding

The S2I transformation is implemented for informationhiding, as illustrated in Fig. 3. According to Section 4.1,the process of S2I transformation includes four main steps:element grouping and choosing, secret-guided position ar-rangement of elements, latent vector construction, and imagegeneration, as shown in Fig. 3. These steps are detailed asfollows.

Step (1): Element grouping and choosing. To constructa NT -dimensional latent vector for image generation, NT

Mes

sage

0

0

1

0

0

1

0

1

1

0

1

0

0

1

m1

m2

38-th choice

1,3,4,8

42-th choice

5,6,9,12

Latent vector

Position arrangement array

1 2 3 5 8 9 104 6 7

Step (1): Element grouping and choosing

Step (2): Secret-guided position arrangement

Constructing

Grouping

Step (4): Image generation Step (3): Latent vector construction

Integerizing

Stable NICE

model

Arranging

Choosing

Element groups

Set S

......

37

41

11 12

1 2 3 5 8 9 104 6 7 11 12

Generating

Fig. 3. The illustration of S2I transformation for information hiding. Theprocess of information hiding includes four main steps: element groupingand choosing, secret-guided position arrangement, latent vector construction,and image generation.

elements are randomly sampled from the standard Gaussiandistribution at first. Then, we split the range of [−2, 2] into Kparts in the way of making the probability value of each part tobe equal, as most of the sampled elements fall into this range.According to the K range parts, these sampled elements aredivided into K groups denoted as {Gi|1 ≤ i ≤ K}.

Then, we choose n elements nearest to the center of eachgroup, which are less likely to shift to the adjacent groups, toobtain N elements for latent vector construction. Thus, N =K × n and N ≤ NT . Note that the information of K and nis shared between communication participants beforehand forinformation hiding and extraction.

Step (2): Secret-guided position arrangement of ele-ments. Guided by a given secret message M , the n elementsof each group Gi are arranged into a set of corresponding npositions, which are selected from the position set pos. Here,pos records the [N−(i−1)n] remaining positions for arrangingthese elements, where (i−1)n means the number of positionsoccupied by the elements of 1-th to (i − 1)-th groups. Thenumber of choices for selecting n positions from pos canbe computed by ωi = C(N − (i − 1)n, n), where C(x, y)denotes the number of choices for selecting y positions fromx positions. Thus, blog2 ωic-bit secret message can be encodedas the position arrangement of elements of Gi.

The secret-guided position arrangement for the elements ofeach group is elaborated by the pseudo code of Algorithm1. To further illustrate Algorithm 1, we present an exampleof secret-guided position arrangement, which is shown inFig. 3. Suppose there are N = 12 elements chosen fromK = 3 groups {Gi|1 ≤ i ≤ 3}, and the number of elementschosen from each group is n = 4. As 12 elements needs tobe arranged, the initial position set used for arranging theseelements is denoted as pos = (1, 2, 3, . . . , 12). Guided by a

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 6

Algorithm 1 Secret-guided position arrangement of elements

Input:Secret bitstream: M=’00100101101001’Number of groups: KNumber of chosen elements in each group: nNumber of chosen elements in total: N

Output:Position arrangement array: Ind = (Ind[1], . . . , Ind[N ])

1: Record number of remaining positions r ← N2: Record remaining positions pos← (1, 2, 3, . . . , N)3: for i = 1 to K − 1 do4: Compute number of choices of selecting n positions

from r positions ωi ← C(r, n)5: Read the next blog2 ωic bits from M as a positive

decimal integer m← Read(M ,blog2 ωic)6: Select n positions from pos guided by m,

sel←Select(m,pos, n)7: for p in sel do8: Set Ind[p] as the group No., i.e., i, Ind[p]← i9: end for

10: r ← r − n11: Remove sel from pos, pos← PosDelete(pos,sel)12: end for13: return Ind

given secret bitstream M =’00100101101001’, we arrange theelements of each group into a set of positions selected frompos. More details are given as follows.

To arrange n=4 elements of the first group G1, we shouldselect 4 positions from the 12 positions in pos, and thus thenumber of choices is C(12, 4) = 495. Since 495 choices canbe used to represent blog2 495c = 8 bits of secret message, weread the first 8 bits from the secret bitstream, i.e., ’00100101’,as the decimal integer m = 37. Then, the (m + 1) = 38-th position choice in lexicographical order, i.e., the positions“1,3,4,8”, are selected for arranging the elements of G1. Thus,we set Ind[1], Ind[3], Ind[4], Ind[8] of position arrangementarray Ind as the No. of first group, i.e., Ind[1] = Ind[3] =Ind[4] = Ind[8] = 1. That means the positions “1,3,4,8” willbe used to arrange the elements of G1.

Similarly, to arrange n = 4 elements of G2, we should select4 positions from the remaining 8 positions in pos, and thus thenumber of choices is C(8, 4) = 70. The 70 choices enable thehiding of blog2 70c = 6 bits of secret message. Thus, the next6 bits are read from the secret bitstream, i.e., ’101001’, as thedecimal integer m = 41. Then, the (m+ 1) = 42-th positionchoice in lexicographical order, i.e., the positions “5,6,9,12”,are selected for arranging the four elements of G2. Thus, weset Ind[5] = Ind[6] = Ind[9] = Ind[12] = 2. That means thepositions “5,6,9,12” will be used to arrange the four elementsof G2.

Guided by the given secret message M , we can fi-nally obtain the position arrangement array Ind =(1, 3, 1, 1, 2, 2, 3, 1, 2, 3, 3, 2). Consequently, the secret mes-sage M is encoded as the position arrangement of those

elements. Indicated by this array Ind, the latent vector canbe constructed by the following step.

Step (3): Latent vector construction. Accordingto the obtained position arrangement array Ind =(1, 3, 1, 1, 2, 2, 3, 1, 2, 3, 3, 2), we put the four elements of G1

into the “1-th, 3-th, 4-th, and 8-th” positions, the four elementsof G2 into the “5-th, 6-th, 9-th, and 12-th” positions, andthe elements of groups G3 into the remaining positions ina random order. These arranged elements are concatenatedto form a N -dimensional vector. As there are NT sampledelements in total, the other (NT − N) elements are directlyconcatenated at the end of the vector to obtain the final NT -dimensional latent vector z.

Step (4): Image generation. After constructing the NT -dimensional latent vector z, by using a private Key sharedbetween communication participants, we randomly scramblethe elements of z, and then map the scrambled vector to agenerated high-quality image by the Glow model. Finally,the generated image is used as the stego-image for covertcommunication.

In the proposed S2IRT scheme, the room of position ar-rangement of N chosen elements is very large for constructingthe latent vector and the group Nos. of these elements arestable, which lead to high efficiency and robustness for en-coding the secret message as the constructed vector. Moreover,the bijective-mapping between the constructed vector and thegenerated image can be realized by the Glow model. Thus, theproposed S2IRT scheme has good efficiency and reversibility,which lead to both high hiding capacity and accurate extractionof secret message for steganography. In addition, the con-structed vector can be mapped to a high-quality image bythe Glow model and the information hiding is implementedby generating a new image rather than modifying an existingimage. Therefore, the proposed steganographic approach alsohas desirable anti-detectability and imperceptibility. These arealso proven by experiments in Section 6.

C. S2I Reverse Transformation for Information Extraction

As information extraction is the reverse process of infor-mation hiding, the S2I reverse transformation is implementedto extract the secret message from the generated image. Itincludes two main steps: latent vector recovery and secretmessage extraction.

Step (1): Latent vector recovery. At the receiving end,the received image is reversely mapped to a latent vector bythe Glow model. Then, by the shared Key, the position orderof the elements of the latent vector is restored to obtain therecovered latent vector z′.

Step (2): Secret message extraction. By the shared in-formation of K and n, the first N = Kn elements of therecovered latent vector z′ are first re-grouped into K groupsin the same manner as described in the stage of informationhiding.

For n elements of each group Gi, we obtain the set of theirpositions in z′ and compute the choice No. of this positionset in all C(N − (i− 1)n, n) possible choices. According tostep (2) of information hiding stage, the choice No. is equal

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 7

to (mi+1), where mi is the decimal integer of correspondingsecret bits. Thus, for each group Gi, we can obtain mi fromthe position choice No. of its n elements and then transformmi to the corresponding secret bits. After obtaining all the bits,we concatenate these bits to extract the final secret messageM ′.

D. Hiding Capacity Analysis

As mentioned in the above subsection, for i-th group,blog2 ωic-bit secret message can be encoded as the positionarrangement of its elements, where ωi = C(N − (i− 1)n, n).Thus, we can compute the total number of bits that can beencoded as the position arrangement for all groups by

BNS2I =

K−1∑i=1

blog2 ωic (7)

It satisfies the following equation.

(

K−1∑i=1

log2 ωi)−K + 1 <

K−1∑i=1

blog2 ωic ≤K−1∑i=1

log2 ωi (8)

Where,K−1∑i=1

log2 ωi = log2

K−1∏i=1

C(N − (i− 1)n, n) (9)

According to the Stirling’s approximation, we can obtain thefollowing result.K−1∑i=1

log2 ωi ≈ log2((1√2π

)K−1 (Kn)Kn+1/2

nK(n+1/2))

= −(K − 1) log2

√2π + (Kn+ 1/2) log2K −

K − 1

2log2 n

(10)Suppose all the NT sampled elements are chosen and

divided into NT groups, i.e., N = NT and K = NT , and eachgroup contains only n = 1 element for information hiding.The sizes of the training images used in the experiments are256 × 256 × 3 = 196, 608 and the dimension of constructedlatent vector, i.e., NT , is the equal to the sizes of thoseimages, and thus NT = 196, 608. According to Eq. (8)and (10), the maximum BNS2I is approximately estimatedto be in the range of (3, 000, 094, 3, 196, 701], and thus themaximum number of bits hidden in per image pixel perchannel (bpp) can theoretically reach up to a very high value,i.e., 3, 196, 701/196, 608 ≈ 16.2 bpp.

According to Eq. (10), larger K and n lead to higher ca-pacity of information hiding but lower accuracy of informationextraction, which is also analyzed in Section 6.2.

V. SE-S2IRT SCHEME FOR GENERATIVESTEGANOGRAPHY

In practice, images usually suffer from a variety of at-tacks such as intensity changing, contrast enhancement andnoise addition during transmitting on networks. In the S2IRTscheme, although the group Nos. of the chosen elements usedfor latent vector construction are relatively stable, those attackswill cause some elements of the constructed latent vector to

Set S

Mes

sage

1

0

1

0

1

1

0

0

2-th choice 2

1-th choice 4

2-th choice 8

1-th choice 10

1

Integerizing

0

1

0

1 2 35

8 910

4 67

11 12

Latent vector

Step (4): Image generation Step (3): Latent vector construction

Stable NICE

model

2-th choice 3

2-th choice 6

1-th choice 7

1-th choice 11

1

1

0

0

Position arrangement array

1 2 35

8 910

4 67

11 12

......

Element groups

Grouping

Step (2): Secret-guided position arrangement

Step (1): Element grouping and choosing

Choosing

Arranging

ConstructingGenerating

Fig. 4. The illustration of SE-S2I transformation for information hiding.The process of information hiding is mainly divided into four steps: elementgrouping and choosing, secret-guided position arrangement, latent vectorconstruction, and image generation.

shift from one group to its adjacent ones. Once the groupNo. of one element is changed, the position choice Nos. forarranging the elements of most groups would be changed andthus most secret bits encoded as the position arrangement forthese groups will be affected. Consequently, it is hard for theS2IRT scheme to accurately extract secret information fromthe recovered latent vector under those attacks. To improve therobustness to those attacks, we propose a separate encoding-based S2IRT (SE-S2IRT) scheme for steganography.

A. SE-S2I Transformation for Information Hiding

The proposed SE-S2IRT scheme is based on the idea of sep-arate encoding of secret message. Fig. 4 illustrates the processof SE-S2I transformation for information hiding, which alsoincludes following four steps.

Step (1): Element grouping and choosing. This step isthe same as that of S2IRT scheme. The NT sampled elementsare randomly sampled from the standard Gaussian distributionand divided into K groups denoted as {Gi|1 ≤ i ≤ K}. Then,n elements nearest to the center of each group are chosen toobtain the N elements, and thus N = K × n and N ≤ NT .

Step (2): Secret-guided position arrangement of ele-ments. To improve the robustness to image attacks, guidedby the secret message M , for each group Gi, we select oneposition from every [N − (i− 1)n]/n = (K − i+1) adjacentpositions of the position set pos to separately arrange the nelements of Gi. Where, pos records the [N−(i−1)n] remain-ing positions, and (i − 1)n means the number of positionsoccupied by the elements of 1-th to (i − 1)-th groups. Thenumber of choices for selecting one position from (K− i+1)positions can be computed by ωi = C(K− i+1, 1), and thus

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 8

blog2 ωic bits can be encoded as the position arrangement forone element of Gi. As there are n elements in Gi, totallynblog2 ωic can be encoded as the position arrangement forthe n elements of Gi.

To illustrate the position arrangement, Fig. 4 shows anexample of secret-guided position arrangement using the SE-S2IRT scheme. Suppose there are totally N = 12 chosenelements in K = 3 groups {Gi|1 ≤ i ≤ 3} and the numberof elements in each group is n = 4. As 12 elements needto be arranged, the initial position set used for arrangingthese elements is pos = (1, 2, 3, . . . , 12). Then, guided by agiven secret bitstream M=’10101100’, we separately arrangethe elements of each group into the corresponding selectedpositions. More details are given as follows.

For arranging one element of G1, the number of choices ofselecting one position from 3 adjacent positions is C(3, 1) =3, and thus blog2 3 = 1 can be encoded. Therefore, we readthe first one bit from the secret bitstream, i.e., ’1’, as thedecimal integer m = 1. Then, we select the (m + 1) = 2-th position choice in lexicographical order, i.e., the position’2’, to arrange one element of G1. Next, we set the Ind[2] ofposition arrangement array Ind as the No. of first group, i.e.,Ind[2] = 1. Similarly, we read the next bit, i.e., ’0’, as thedecimal integer m2 = 0, and then use the (m + 1) = 1-thposition choice in the next three adjacent positions, i.e., theposition ’4’, to arrange one element of G1. Thus, Ind[4] = 1.After separately arranging all the 4 elements of G1 in thesame manner, totally 4 bits, i.e., ’1010’, are encoded, andInd[2], Ind[4], Ind[8] and Ind[10] are set as 1. The elementarrangement process is also shown in Fig. 4.

The position arrangement for other groups is the same asthat for G1. Guided by the given secret message M , finally,we can select the corresponding positions for the elementsof all groups and obtain the position arrangement arrayInd = (3, 1, 2, 1, 3, 2, 2, 1, 3, 1, 2, 3). The secret message Mis encoded as as the position arrangement of those elements,and Ind will be used for latent vector construction.

Step (3): Latent vector construction and Step (4): Imagegeneration are also the same as those of S2IRT scheme. Thus,they are skipped. In the SE-S2IRT scheme, once the groupNo. of one element arranged into one of K adjacent positionsis changed, only the position choice Nos. for arranging theelements into the K adjacent positions will be changed andthus only

∑K−1i=1 blog2(K−i+1)c bits will be affected as most.

Therefore, the SE-S2IRT scheme provides better robustness tocommon image attacks than the S2IRT scheme, which is alsoproven by experiments in Section 6.3.

B. SE-S2I Reverse Transformation for Information Extraction

Similarly, the reverse SE-S2I transformation is implementedfor information extraction.

Step (1): Latent vector recovery. This step is the same asthat of S2IRT scheme. Thus, it is also skipped.

Step (2): Secret message extraction. By the shared infor-mation of K and n, we re-group the first N = Kn elements ofthe recovered latent vector z′ in the same manner as describedin the stage of information hiding of S2IRT scheme. For the

n elements of each group Gi, we obtain their positions in z′.Then, we compute the No. of position choice of selecting oneposition from every (K− i+1) adjacent positions. Accordingto the step (2) of information hiding stage in SE-S2IRTscheme, the No. of position choice is equal to (mi+1), wheremi is the decimal integer of corresponding secret bits. Then,the decimal mi is transformed to the blog2 C(L − i + 1, 1)cbits. According to the position choices for the n elements ofeach group Gi, we can obtain nblog2 C(L − i + 1, 1)c bitsof secret message. After obtaining all bits from the elementpositions of all groups, we concatenate these bits to extractthe final secret message M ′.

C. Hiding Capacity Analysis

As mentioned in the above subsection, for i-th group,nblog2 ωic-bit secret message can be encoded as the positionarrangement of its elements, where ωi = C(K − i + 1, 1).Thus, we can compute the total number of bits encoded as theposition arrangement for all groups by

BNSE−S2I =

K−1∑i=1

nblog2 ωic (11)

It satisfies the following equation.

(

K−1∑i=1

n log2 ωi)− n <K−1∑i=1

nblog2 ωic ≤K−1∑i=1

n log2 ωi (12)

Where,K−1∑i=1

n log2 ωi = n log2

K−1∏i=1

(K − i+ 1)) = n log2(K!) (13)

According to the Stirling’s approximation, we can obtain thefollowing result.

K−1∑i=1

n log2 ωi ≈ n log2(√2πK(

K

n)K)

= Kn log2

2K√2πKK

n

(14)

Suppose all the NT sampled elements are chosen and dividedinto NT groups, i.e., N = NT and K = NT , and each groupcontains only n = 1 element. The sizes of the training imagesused in the experiments are 256×256×3 = 196, 608 and thusNT = 196, 608. According to the Eq. (12) and (14), the maxi-mum BNSE−S2I is approximately estimated to be in the rangeof (2, 977, 102, 3, 173, 709], and thus the maximum number ofbits hidden in per image pixel can theoretically reach up to avery high value, i.e., about 3, 173, 709/196, 608 ≈ 16.1 bpp.

According to Eq. (14), larger K and n lead to higher ca-pacity of information hiding but lower accuracy of informationextraction in the SE-S2IRT scheme. The impacts of parametersK and n are also analyzed in Section 6.2.

VI. EXPERIMENTS

In this section, we first introduce the experimental set-tings.Then, the impacts of parameters of S2IRT and SE-S2IRTschemes—the number of element groups K and the number

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 9

of elements n chosen from each group—are analyzed anddiscussed. Finally, the performances of proposed stenographicapproaches are tested in the aspects of information extractionaccuracy, anti-detectability, imperceptibility and robustnessand are compared with those of state-of-the-arts.

A. Experiment Settings

1) Training datasets: In [46], the original Glow modeltrained on CelebA-HQ dataset [47] can effectively and effi-ciently produce high-quality “realistic-looking image” images.Thus, in our experiments, we also adopt this dataset to train theGlow model. The CelebA-HQ dataset contains 30,000 high-resolution face images selected from the CelebA dataset. Thesizes of the images in the dataset are up to 1024× 1024. Forthe sake of training efficiency, we rescale the sizes of imagesto 256×256 and train the Glow model on the rescaled images.

2) Model training details: The Glow model is trained bymaximizing the objective function defined in Eq. (6) with thelearning rate 0.001. The depth of each flow is set as 32 and thelevel of the multi-scale architecture is set as 6. After 2,000,000iterations on the dataset, the trained Glow model is obtained.

3) Evaluation criteria: In the experiments, we adopt thefollowing evaluation criteria to evaluate the hiding capacity,extraction accuracy, anti-detectability, and imperceptibility ofdifferent steganographic approaches.

a) Hiding capacity: The information hiding capacities ofmost image steganographic approaches are evaluated by bitsper pixel (bpp), which means the number of secret bits hiddenin per pixel per channel of an image. Also, we evaluate thehiding capacity by bpp, which is defined by

bpp =NT

W ×H × C(15)

Where NT represents the total number of hidden secret bits,W , H , and C the width, the height, and the number ofchannels of the image, respectively.

b) Extraction accuracy: Due to the hiding manner of theproposed steganographic approaches, the length of originalsecret bitstream may be different from that of the extractedsecret bitstream. Thus, the accuracy of information extractionis evaluated based on the Edit Distance (ED) between theoriginal secret bitstream M and the extracted secret bitstreamM ′. The accuracy rate is computed by

IEA = 1−] ED(M,M ′)

max[Len(M), Len(M ′)](16)

Where, Len(M) and Len(M ′) mean the lengths of bitstreamM and M ′, respectively.

c) Anti-detectability: We evaluate the anti-detectabilityperformance against a steganalyzer using the following detec-tion error rate.

PE = minPFA

1

2(PFA + PMD) (17)

where PFA is the false-alarm (FA) probability of steganalyzerand PMD is the missed-detection (MD) probability of stegana-lyzer. Larger PE means higher anti-detectability performancesof steganographic approaches against the steganalyzer.

Fig. 5. The impacts of parameters K and n in the aspects of hiding capacityand extraction accuracy for (a) S2IRT scheme and (b) SE-S2IRT scheme.

d) Imperceptibility: To make the hidden message im-perceptible, there should be no visible difference betweenthe qualities of the images with and without hidden infor-mation. Without the need for an existing cover image, theproposed steganographic approaches directly generate a newimage as the stego-image for information hiding. Therefore,the traditional reference image quality evaluation strategies,such as RSNR and SSIM, cannot be used to test the im-perceptibility of proposed steganographic approach. Thus,we adopt the no-reference image quality evaluation strate-gies, i.e., Blind/Referenceless Image Spatial QUality Evalu-ator (BRISQUE) [48] and dimensionality reduction strategy,i.e., t-distributed Stochastic Neighbor Embedding (t-SNE)[49], to observe the imperceptibility of proposed stegano-graphic approaches. BRISQUE is a natural scene statistic-based distortion-generic blind/referenceless image quality as-sessment model that implements in the spatial domain. It istrained on the TID2008 dataset (TID2008), which contains alarge number of natural images distorted with various attacks,and the quality of each image is scored by people. By inputtingan image into the trained model, the score of the image canbe scored automatically. T-SNE is a statistical method forvisualizing high-dimensional data by transforming each datato a data point in two-dimensional space.

4) Evaluation environment: All the experiments are con-ducted on an NVIDIA RTX 3090 GPU platform using PyTorchwith Python interface.

B. Parameter Impacts

In the proposed S2IRT and SE-S2IRT schemes, for thelatent vector construction, there are two key parameters: K,which is the number of element groups, and n, which is thenumber of elements chosen from each group. We test theparameter impacts on the performances of proposed schemesin the aspects of hiding capacity and extraction accuracy. In

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 10

this experiment, we adopt the trained Glow model, which isintroduced in previous subsection.

Fig. 5. The impacts of parameters K and n in the aspects ofhiding capacity and extraction accuracy for (a) S2IRT schemeand (b) SE-S2IRT scheme. The parameter impacts on theproposed S2IRT and SE-S2IRT schemes are shown in Fig. 5(a)and (b), respectively. From the two figures, it is clear that largerK and n lead to higher hiding capacity. That is because largerK and n offer larger room of position arrangement of elementsfor latent vector construction, which enables more secret bitsto be hidden in the image generated from the constructedlatent vector. However, the accuracy of information extractiondecreases with the increase of K and n for the followingreason. Increasing K and n causes more elements to be nearto the group boundaries, and thus the group Nos. of theseelements are vulnerable. Therefore, the group Nos. of elementsin the recovered latent vector would be quite different fromthose in the original latent vector, which affects the accuracyof information extraction significantly. Moreover, the hidingcapacity of S2IRT is generally higher than that of SE-S2IRTat the same values of K and n. The main reason is that thenumber of ways of selecting n positions from N positions islarger than that of selecting n positions from every K positionsof N ones, and thus more bits can be hidden.

According to Fig. 5(a) and (b), under the condition of per-fect accuracy rate of information extraction, i.e., IEA = 1.0,the highest hiding capacity of S2IRT and that of SE-S2IRTcan reach up to 4.3 bpp and 4.1 bpp, respectively. FromFig. 5 (a) and (b), to achieve a required level of hidingcapacity IHC , several combinations of (K,n) can be chosen.For example, when the hiding capacity is required to beIHC = 4 bpp for S2IRT, the set of combinations of (K,n)are (23, 8000), (29, 6000), (33, 5000), (40, 4000), (50, 3000),(70, 2000) and (128, 1000). Also, it is found that smallerK generally leads to higher extraction accuracy at a certainlevel of hiding capacity. Thus, the parameter combination withsmallest K, i.e., (23, 8000), is chosen to achieve the requiredhiding capacity IHC = 4.0 bpp. In the above manner, with arequired level of hiding capacity, the parameter combinations(K,n) can be determined for S2IRT and SE-S2IRT, and theyare used in the following experiments.

C. Performance Evaluation and Comparison

In this subsection, we evaluate the performances of theproposed approaches in the aspects of information extractionaccuracy, anti-detectability, imperceptibility and robustnessunder different hiding payloads, and compare with those ofthe state-of-the-arts. These approaches are listed as follows.

S-UNIWARD [17]: It is a famous conventional stegano-graphic approach, which is based on a heuristically defineddistortion function.

UT-6HPF-GAN [24]: This conventional steganographic ap-proach is based on the distortion function learned by GAN.

Deep-Stego [26]: This conventional approach directly trainstwo fully convolutional networks to hide a secret image withina cover image and to extract the hidden image from the cover,respectively.

TABLE ITHE INFORMATION EXTRACTION ACCURACY OF THOSE APPROACHES

WITH DIFFERENT HIDING PAYLOADS

Approaches Hiding payloads (bpp)0.1 0.5 1.0 2.0 4.0

Deep-Stego - - 0.9643 0.8315 0.7524SWE 0.7314 0.7167 0.7134 0.7120 0.7122S2IRT 1.0000 1.0000 1.0000 1.0000 0.9943

SE-S2IRT 1.0000 1.0000 1.0000 1.0000 1.0000

SWE [6]: This is a generative steganographic approach,called as steganography without embedding (SWE), in whichthe secret information is directly encoded as the input noiseof DCGAN to generate the stego-images.

S2IRT: This is the steganographic approach using S2IRTscheme proposed in Section 4.

SE-S2IRT: This is the steganographic approach using SE-S2IRT scheme proposed in Section 5. In both S2IRT and SE-S2IRT, the Glow model trained on the CelebA-HQ dataset isused to generate the stego-images.

1) Information extraction accuracy: Table I lists the perfor-mances of information extraction accuracy (IEA) of differentsteganographic approaches, i.e., Deep-Stego, SWE, S2IRT, andSE-S2IRT, with the increase of hiding payloads. From thistable, it is clear that the proposed steganographic approaches,i.e., S2IRT and SE-S2IRT, achieve much higher informationextraction accuracy than the conventional steganographic ap-proach, i.e., Deep-Stego, and the generative steganographicapproach, i.e., SWE, under different hiding payloads. Theextraction accuracy rates of S2IRT and SE-S2IRT keep at avery high-level (IEA ≈ 1.0) when the hiding payload rangesfrom 0.1 bpp to 4 bpp. That is because S2IRT and SE-S2IRTachieve secret-to-image reversible transformation, mainly dueto the efficient and robust message encoding and the bijectivemapping between latent space and image space by the Glowmodel. In a summary, the proposed generative steganographicapproaches can achieve high hiding capacity (up to 4 bpp) andaccurate extraction of secret message (almost 100% accuracyrate), simultaneously.

In contrast, with the increase of hiding payload, the infor-mation extraction accuracies of Deep-Stego and SWE decreasefor the following reasons. For Deep-Stego, the secret messageis compressed and then is hidden into a cover image bytraining a fully convolutional network in a lossless way, andthus it is hard to accurately extract the secret message fromthe stego-image. For SWE, the secret message is encoded asa low-dimensional noise signal of DCGAN to generate thestego-image. However, DCGAN only allows one-way mappingfrom the input information to the image, which makes thehidden secret message difficult to be extracted from the stego-image. Thus, Deep-Stego and SWE cannot achieve accurateinformation extraction under high hiding payloads.

2) Anti-detectability: To evaluate and compare the anti-detectability performances of different steganographic ap-proaches, we use the famous steganalyzers including SRM[32] and XuNet [33] to detect the presence of hidden infor-mation in the stego-images. Where, SRM is based on a setof high-dimensional hand-crafted steganalytic features, while

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 11

TABLE IITHE VALUES OF PE OF STEGANOGRAPHIC APPROACHES WITH DIFFERENT HIDING PAYLOADS

Approaches Hiding payloads (bpp)0.1 0.5 1.0 2.0 4.0

SRM

S-UNIWARD 0.4229 0.1824 0.0493 - -UT-6HPF-GAN 0.4414 0.2489 0.0615 - -

Deep-Stego - - 0.0000 0.0000 0.0000SWE 0.2687 - - - -S2IRT 0.2707 0.2711 0.2698 0.2702 0.2701

SE-S2IRT 0.2696 0.2700 0.2710 0.2703 0.2694

XuNet

S-UNIWARD 0.4461 0.1925 0.0712 - -UT-6HPF-GAN 0.4690 0.2971 0.0787 - -

Deep-Stego - - 0.0000 0.0000 0.0000SWE 0.2691 - - - -S2IRT 0.2701 0.2697 0.2686 0.2708 0.2702

SE-S2IRT 0.2701 0.2697 0.2686 0.2708 0.2702

XuNet is based on a modified CNN structure for steganalysis.For the conventional steganographic approaches, the train-

ing dataset of the steganalyzers includes 5000 natural images(cover images) and 5000 stego-images obtained by embeddingthe secret messages into these natural images. It is notable thatthe generative steganographic approaches generate the stego-images by the secret-to-image transformation without the needof the natural images (cover images). For the generativesteganographic approaches, as both the natural images and thegenerated images without information hiding can be regardedas cover images [50], the training dataset of the steganalyzersconsists of 5000 images without information hiding (including2500 natural images and 2500 generated images without infor-mation hiding) and 5000 generated images with informationhiding. By using the two trained steganalyzers, the detectionerror rate PE is computed in the manner described in Section6.1 to evaluate the anti-detectability performances of differentsteganographic approaches.

Table II lists the anti-detectability performances (values ofPE) of those approaches with different hiding payloads. Fromthis table, there are several observations.

From this table, it is clear that the anti-detectability of theproposed generative steganographic approaches, i.e., S2IRTand SE-S2IRT, keeps at a high-level (PE is about 0.27) andgenerally outperform the other approaches significantly, whenthe hiding payload ranges from 0.5-4.0 bpp. That is becausethe proposed generative steganographic approaches directlygenerate new images as stego-images without modification,and thus it is relatively hard for the steganalyzers to detectthe presence of hidden information. In contrast, with theincrease of hiding payload, the performances of the conven-tional steganographic approaches including S-UNIWARD, UT-6HPF-GAN, and Deep-Stego degrade significantly. In theseapproaches, as the secret message is hidden by modifying theexisting cover images, hiding more information into the coverimages leads to more distortion. Thus, it is much easier forthe steganalyzers to successfully defeat these steganographicapproaches, especially under high hiding payloads.

However, when the hiding payload is 0.1 bpp, the PE valuesof the generative steganographic approaches, i.e., S2IRT, SE-S2IRT and SWE, are also about 0.27, which is lower thanthose of the conventional steganographic approaches, i.e., S-UNIWARD, UT-6HPF-GAN, and Deep-Stego. That is because

the limited size of training dataset leads to the limited abilityof generating realistic images of generative models, whichmakes some natural images to be distinguished from generatedimages correctly. However, this problem can be effectivelyalleviated by increasing the size of training dataset.

Also, it is notable that some PE values of the other stegano-graphic approaches including S-UNIWARD, UT-6HPF-GAN,Deep-Stego, and SWE are empty or equal to 0 in Table I forthe following reasons. The PE values of UT-6HPF-GAN andS-UNIWARD are empty when the payload is larger than 1.0bpp, because they cannot achieve the hiding capacity largerthan 1.0 bpp. Since Deep-Stego aims to hide a secret imagewithin an image, they cannot hide the messages of small sizesand thus the PE values are empty when the hiding payload issmaller than 1.0. Moreover, it does not consider the resistanceto steganalyzers, the stego-images can be easily detected bythe steganalyzers, and thus the other PE values of Deep-Stegois equal to 0. Due to the irreversibility of DCGAN, SWEcannot extract the hidden information when the hiding payloadis larger than 0.1 and thus the corresponding PE values ofSWE are empty. Therefore, the high hiding capacities of Deep-Stego, SWE, S-UNIWARD and UT-6HPF-GAN come at theexpense of security performance or information extractionaccuracy.

3) Imperceptibility: Without the need for an existing coverimage, the generative steganographic approaches, i.e., SWE,S2IRT and SE-S2IRT, directly transform a given secret mes-sage to a generated image for steganography. As there isno need of cover images in these approaches, the traditionalreference image quality assessment strategies such as RSNRand SSIM are not suitable for evaluating the imperceptibilityof these approaches. Thus, the typical model of referenceless image quality assessment strategy, i.e., BRISQUE, isadopted in the experiments. However, BRISQUE is not sensi-tive to slight noise, and it is also experimentally found thatthe BRISQUE scores of the stego-images obtained by theconventional steganographic approaches are almost the sameas those of original cover images, since those stego-imagesare obtained by slightly modifying the original cover images.Thus, BRISQUE is not suitable to evaluate the qualities ofstego-images for the conventional steganographic approaches.

Therefore, we only compute and compare the BRISQUEscores for the stego-images generated by the generative

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 12

TABLE IIITHE BRISQUE SCORES OF STEGO-IMAGES OF THOSE APPROACHES

WITH DIFFERENT HIDING PAYLOADS

Approaches Hiding payloads (bpp)0 0.1 0.5 1.0 2.0 4.0

SWE 9.07 8.69 - - - -S2IRT 8.85 8.13 7.32 8.54 9.13 7.44

SE-S2IRT 6.89 7.93 9.03 7.21 7.69 8.17

(a) (b)Fig. 6. The t-SNE data of real images and the images generated by (a)S2IRT and (b) SE-S2IRT with and without information hiding. The dataof real images, images generated without information hiding, and imagesgenerated with hiding payload of 4 bpp is marked by blue, red, and yellowdots, respectively.

steganographic approaches, i.e., SWE, S2IRT, and SE-S2IRT,with different hiding payloads. Table III shows the averageBRISQUE scores of the stego-images generated by differentapproaches with different hiding payloads. Smaller BRISQUEscore means higher image quality. From this table, althoughthe BRISQUE scores of SWE are close to those of S2IRT andSE-S2IRT, SWE cannot accurately extract the hidden infor-mation when the hiding payload is higher than 0.1. Also, inS2IRT and SE-S2IRT, it is clear that the imperceptibility of thegenerated images without hiding message (bpp = 0) is similarto the generated images with the hiding payload ranging from0.1-4.0. That indicates the quality of stego-images generatedby S2IRT and SE-S2IRT will not be affected with the increaseof hiding capacity, mainly due to the efficiency and robustnessof message encoding and the powerful ability of generatinghigh-quality images of Glow model.

To further illustrate imperceptibility of the proposedsteganographic approaches, we also compute and comparethe t-SNE data of the real images and the images gener-ated with and without information hiding. Where, t-SNE isthe dimensionality reduction strategy, which transforms high-dimensional image data to 2-dimensional data so as to observethe distribution of values of image data. From Fig. 6, itis clear that it is hard to distinguish the real images, theimages generated without information hiding, and those withinformation hiding.

Fig. 7 shows toy examples of real images and imagesgenerated by S2IRT without and with information hiding. Itis clear that S2IRT can achieve excellent hiding capacity,i.e., up to 4.0 bpp. Meanwhile, it is hard to distinguish theimages generated with information hiding from the imagesgenerated without information hiding. Also, mainly due to theinsufficient training of Glow model, it is notable that there aresome slight visual differences between the real images andthe generated images. The visual differences can be further

TABLE IVTHE IEA VALUES OF STEGO-IMAGES OF THOSE PROPOSED APPROACHES

WITH DIFFERENT HIDING PAYLOADS

Attack Approaches Hiding payloads (bpp)0.1 0.5 1.0

Intensity change with 2% S2IRT 0.68 0.63 0.60SE-S2IRT 1.00 0.91 0.75

Contrast enhancement with 5% S2IRT 0.66 0.62 0.59SE-S2IRT 1.00 0.96 0.77

Salt & Pepper Noise withthe factor of 3%

S2IRT 0.69 0.63 0.61SE-S2IRT 0.91 0.82 0.73

Gaussian noise with standarddeviation σ = 10−3

S2IRT 0.66 0.61 0.58SE-S2IRT 0.84 0.78 0.69

JPEG compression 90 S2IRT 0.66 0.62 0.59SE-S2IRT 0.80 0.73 0.61

Rotation ±0.75◦S2IRT 0.66 0.60 0.58

SE-S2IRT 0.85 0.79 0.71

Image sterilization S2IRT 1.00 0.95 0.61SE-S2IRT 1.00 1.00 0.99

TABLE VTHE PROBABILITIES OF SUCCESSFUL CRACKING MESSAGE HIDDEN BY

S2IRT AND SE-S2IRT

K Approaches n10 20 30

10 S2IRT 0.742× 10−67 0.987× 10−177 0.805× 10−232

SE-S2IRT 0.637× 10−57 0.406× 10−114 0.102× 10−142

20 S2IRT 0.322× 10−231 0.209× 10−490 0.396× 10−614

SE-S2IRT 0.277× 10−162 0.771× 10−325 0.406× 10−406

30 S2IRT 0.104× 10−403 0.561× 10−833 0.853× 10−1043

SE-S2IRT 0.107× 10−282 0.115× 10−565 0.379× 10−707

reduced by training the Glow model on larger-sized datasets.According to the above, the proposed steganographic ap-

proaches can consistently achieve desirable imperceptibilityperformances under different hiding payloads.

4) Robustness: To test and compare the robustness perfor-mances of S2IRT and SE-S2IRT, we adopt a set of commonattacks including intensity change, contrast enhancement, salt& pepper noise, Gaussian noise, JPEG compression, rotationas well as image sterilization [51], which aims remove secretdata hidden within an image. Then, these attacks are conductedon the generated stego-images. The accuracy rates of informa-tion extraction of the two approaches are listed in Table IV.

From Table IV, it can be clearly observed that SE-S2IRTperforms better than S2IRT in the aspect of robustness forthe following reasons. In SE-S2IRT, once the group No. ofone element arranged into one of K adjacent positions ischanged, only the position choice Nos. for arranging theelements into the K adjacent positions will be changed, andthus only

∑K−1i=1 blog2(K − i + 1)c bits will be affected at

most. On the contrary, in S2IRT, once the group No. of oneelement is changed, the position choice Nos. for arranging theelements of most groups would be changed. Thus, most ofthe secret bits encoded as the position arrangement for thesegroups will be affected. Therefore, SE-S2IRT can improve therobustness performance of S2IRT significantly. Also, it is clearthat SE-S2IRT achieves good robustness to intensity change,contrast enhancement and image sterilization, but relativelylow robustness to Salt & Pepper Noise, Gaussian noise, androtation.

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 13

(a) (b) (c) (d)Fig. 7. The real images and generated face images. (a) The real face images, (b) the images generated by S2IRT without information hiding, (c) the imagesgenerated by S2IRT with hiding payload of 2 bpp, and (d) the images generated by S2IRT with hiding payload of 4 bpp.

D. Security Analysis

We analyze the security of the proposed steganographic ap-proaches in this subsection. Suppose an attacker can interceptthe shared information of K and n, but cannot obtain theKey, which is protected well by communication participants.According to Section 4.4 and 5.3, the probabilities of suc-cessfully cracking the secret messages hidden by S2IRT andSE-S2IRT can be computed by

Pc =1∏K−1

i=1 si(18)

Where, si = C(N − (i − 1)n, n) in the S2IRT scheme andsi = [C(K − i + 1, 1)]n in the SE-S2IRT scheme. Thecomputed Pc values with different K and n are listed inTable V. It is clear that the cracking probabilities of proposedsteganographic approaches are very small, and thus they aresecure enough against a malicious attack of random guess.

VII. CONCLUSION

This paper has provided a novel idea for generativesteganography. It presents two secret-to-image reversible trans-formation schemes, i.e., S2IRT and SE-S2IRT, based on theGlow model to implement generative steganography. The pro-posed steganographic approaches can achieve excellent hidingcapacity (over 4 bpp) and accurate information extraction(almost 100% accuracy rate) while maintaining desirable anti-detectability and imperceptibility, and outperform the state-of-the-arts significantly. Moreover, SE-S2IRT can improve therobustness to a variety of common attacks significantly.

In many practical steganographic tasks, it is required to hidea large amount of information in an image while maintaininghigh extraction accuracy and desirable anti-detectability, im-perceptibility and robustness. The pro-posed steganographicapproaches can handle these tasks well. Hence, we can con-clude that the proposed approaches have important practicalsignificance in the area of information hiding. In future, weintend to extent the proposed approaches to generate othercommon types of images such as object images and landscapeimages for steganography.

REFERENCES

[1] T. Filler, J. Judas, and J. Fridrich, “Minimizing additive distortion insteganography using syndrome-trellis codes,” IEEE Transactions onInformation Forensics and Security, vol. 6, no. 3, pp. 920–935, 2011.

[2] Z. Zhou, Y. Mu, and Q. Wu, “Coverless image steganography usingpartial-duplicate image retrieval,” Soft Computing, vol. 23, no. 13, pp.4927–4938, 2019.

[3] Z. Zhou, H. Sun, R. Harit, X. Chen, and X. Sun, “Coverless imagesteganography without embedding,” in International Conference onCloud Computing and Security. Springer, 2015, pp. 123–132.

[4] K.-C. Wu and C.-M. Wang, “Steganography using reversible texturesynthesis,” IEEE Transactions on Image Processing, vol. 24, no. 1, pp.130–139, 2014.

[5] S. Li and X. Zhang, “Toward construction-based data hiding: Fromsecrets to fingerprint images,” IEEE Transactions on Image Processing,vol. 28, no. 3, pp. 1482–1497, 2018.

[6] D. Hu, L. Wang, W. Jiang, S. Zheng, and B. Li, “A novel imagesteganography method via deep convolutional generative adversarialnetworks,” IEEE Access, vol. 6, pp. 38 303–38 314, 2018.

[7] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,”Advances in neural information processing systems, vol. 27, 2014.

[8] M.-m. Liu, M.-q. Zhang, J. Liu, Y.-n. Zhang, and Y. Ke, “Coverlessinformation hiding based on generative adversarial networks,” arXivpreprint arXiv:1712.06951, 2017.

[9] Z. Zhang, G. Fu, R. Ni, J. Liu, and X. Yang, “A generative method forsteganography by cover synthesis with auxiliary semantics,” TsinghuaScience and Technology, vol. 25, no. 4, pp. 516–527, 2020.

[10] Y. Luo, J. Qin, X. Xiang, and Y. Tan, “Coverless image steganographybased on multi-object recognition,” IEEE Transactions on Circuits andSystems for Video Technology, vol. 31, no. 7, pp. 2779–2791, 2020.

[11] W. Jiang, D. Hu, C. Yu, M. Li, and Z.-q. Zhao, “A new steganographywithout embedding based on adversarial training,” in Proceedings of theACM Turing Celebration Conference-China, 2020, pp. 219–223.

[12] J. Li, K. Niu, L. Liao, L. Wang, J. Liu, Y. Lei, and M. Zhang, “Agenerative steganography method based on wgan-gp,” in InternationalConference on Artificial Intelligence and Security. Springer, 2020, pp.386–397.

[13] A. Arifianto, M. A. Maulana, M. R. S. Mahadi, T. Jamaluddin, R. Subhi,A. D. Rendragraha, and M. F. Satya, “Edgan: Disguising text asimage using generative adversarial network,” in 2020 8th InternationalConference on Information and Communication Technology (ICoICT).IEEE, 2020, pp. 1–6.

[14] C.-K. Chan and L.-M. Cheng, “Hiding data in images by simple lsbsubstitution,” Pattern recognition, vol. 37, no. 3, pp. 469–474, 2004.

[15] J. Mielikainen, “Lsb matching revisited,” IEEE signal processing letters,vol. 13, no. 5, pp. 285–287, 2006.

[16] T. Pevny, T. Filler, and P. Bas, “Using high-dimensional image models toperform highly undetectable steganography,” in International Workshopon Information Hiding. Springer, 2010, pp. 161–177.

[17] V. Holub, J. Fridrich, and T. Denemark, “Universal distortion functionfor steganography in an arbitrary domain,” EURASIP Journal on Infor-mation Security, vol. 2014, no. 1, pp. 1–13, 2014.

[18] B. Li, M. Wang, J. Huang, and X. Li, “A new cost function for spatialimage steganography,” in 2014 IEEE International Conference on ImageProcessing (ICIP). IEEE, 2014, pp. 4206–4210.

[19] V. Sedighi, R. Cogranne, and J. Fridrich, “Content-adaptive steganog-raphy by minimizing statistical detectability,” IEEE Transactions onInformation Forensics and Security, vol. 11, no. 2, pp. 221–234, 2015.

[20] W. Zhang, Z. Zhang, L. Zhang, H. Li, and N. Yu, “Decomposing jointdistortion for adaptive steganography,” IEEE Transactions on Circuitsand Systems for Video Technology, vol. 27, no. 10, pp. 2274–2280, 2016.

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 14

[21] X. Liao, Y. Yu, B. Li, Z. Li, and Z. Qin, “A new payload partitionstrategy in color image steganography,” IEEE Transactions on Circuitsand Systems for Video Technology, vol. 30, no. 3, pp. 685–696, 2019.

[22] W. Su, J. Ni, X. Hu, and J. Fridrich, “Image steganography withsymmetric embedding using gaussian markov random field model,”IEEE Transactions on Circuits and Systems for Video Technology,vol. 31, no. 3, pp. 1001–1015, 2020.

[23] W. Tang, S. Tan, B. Li, and J. Huang, “Automatic steganographicdistortion learning using a generative adversarial network,” IEEE SignalProcessing Letters, vol. 24, no. 10, pp. 1547–1551, 2017.

[24] J. Yang, D. Ruan, J. Huang, X. Kang, and Y.-Q. Shi, “An embeddingcost learning framework using gan,” IEEE Transactions on InformationForensics and Security, vol. 15, pp. 839–851, 2019.

[25] W. Tang, B. Li, M. Barni, J. Li, and J. Huang, “An automatic costlearning framework for image steganography using deep reinforcementlearning,” IEEE Transactions on Information Forensics and Security,vol. 16, pp. 952–967, 2020.

[26] S. Baluja, “Hiding images within images,” IEEE transactions on patternanalysis and machine intelligence, vol. 42, no. 7, pp. 1685–1697, 2019.

[27] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networksfor semantic segmentation,” in Proceedings of the IEEE conference oncomputer vision and pattern recognition, 2015, pp. 3431–3440.

[28] J. Fridrich and M. Goljan, “On estimation of secret message lengthin lsb steganography in spatial domain,” in Security, steganography,and watermarking of multimedia contents VI, vol. 5306. InternationalSociety for Optics and Photonics, 2004, pp. 23–34.

[29] C. Chen and Y. Q. Shi, “Jpeg image steganalysis utilizing both intrablockand interblock correlations,” in 2008 IEEE International Symposium onCircuits and Systems. IEEE, 2008, pp. 3029–3032.

[30] T. Pevny, P. Bas, and J. Fridrich, “Steganalysis by subtractive pixeladjacency matrix,” IEEE Transactions on information Forensics andSecurity, vol. 5, no. 2, pp. 215–224, 2010.

[31] J. Fridrich and J. Kodovsky, “Rich models for steganalysis of digitalimages,” IEEE Transactions on information Forensics and Security,vol. 7, no. 3, pp. 868–882, 2012.

[32] Y. Qian, J. Dong, W. Wang, and T. Tan, “Learning and transferring repre-sentations for image steganalysis using convolutional neural network,” in2016 IEEE international conference on image processing (ICIP). IEEE,2016, pp. 2752–2756.

[33] G. Xu, H.-Z. Wu, and Y.-Q. Shi, “Structural design of convolutionalneural networks for steganalysis,” IEEE Signal Processing Letters,vol. 23, no. 5, pp. 708–712, 2016.

[34] G. Xu, H.-Z. Wu, and Y. Q. Shi, “Ensemble of cnns for steganalysis:An empirical study,” in Proceedings of the 4th ACM Workshop onInformation Hiding and Multimedia Security, 2016, pp. 103–107.

[35] W. Tang, B. Li, S. Tan, M. Barni, and J. Huang, “Cnn-based adversarialembedding for image steganography,” IEEE Transactions on InformationForensics and Security, vol. 14, no. 8, pp. 2074–2087, 2019.

[36] S. Bernard, P. Bas, J. Klein, and T. Pevny, “Explicit optimization of minmax steganographic game,” IEEE Transactions on Information Forensicsand Security, vol. 16, pp. 812–823, 2020.

[37] Z. Qian, H. Zhou, W. Zhang, and X. Zhang, “Robust steganographyusing texture synthesis,” in Advances in Intelligent Information Hidingand Multimedia Signal Processing. Springer, 2017, pp. 25–33.

[38] J. Xu, X. Mao, X. Jin, A. Jaffer, S. Lu, L. Li, and M. Toyoura, “Hiddenmessage in a deformation-based texture,” The Visual Computer, vol. 31,no. 12, pp. 1653–1669, 2015.

[39] A. Odena, C. Olah, and J. Shlens, “Conditional image synthesis withauxiliary classifier gans,” in International conference on machine learn-ing. PMLR, 2017, pp. 2642–2651.

[40] R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE internationalconference on computer vision, 2015, pp. 1440–1448.

[41] A. Radford, L. Metz, and S. Chintala, “Unsupervised representationlearning with deep convolutional generative adversarial networks,” arXivpreprint arXiv:1511.06434, 2015.

[42] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville,“Improved training of wasserstein gans,” Advances in neural informationprocessing systems, vol. 30, 2017.

[43] Q. Le and T. Mikolov, “Distributed representations of sentences anddocuments,” in International conference on machine learning. PMLR,2014, pp. 1188–1196.

[44] L. Dinh, D. Krueger, and Y. Bengio, “Nice: Non-linear independentcomponents estimation,” arXiv preprint arXiv:1410.8516, 2014.

[45] L. Dinh, J. Sohl-Dickstein, and S. Bengio, “Density estimation usingreal nvp,” arXiv preprint arXiv:1605.08803, 2016.

[46] D. P. Kingma and P. Dhariwal, “Glow: Generative flow with invertible1x1 convolutions,” Advances in neural information processing systems,vol. 31, 2018.

[47] T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growingof gans for improved quality, stability, and variation,” arXiv preprintarXiv:1710.10196, 2017.

[48] A. Mittal, A. K. Moorthy, and A. C. Bovik, “No-reference imagequality assessment in the spatial domain,” IEEE Transactions on imageprocessing, vol. 21, no. 12, pp. 4695–4708, 2012.

[49] L. Van Der Maaten, “Accelerating t-sne using tree-based algorithms,”The journal of machine learning research, vol. 15, no. 1, pp. 3221–3245, 2014.

[50] K. Chen, H. Zhou, H. Zhao, D. Chen, W. Zhang, and N. Yu,“Distribution-preserving steganography based on text-to-speech genera-tive models,” IEEE Transactions on Dependable and Secure Computing,2021.

[51] M. Imon, P. Goutam, and A. J. Jawahar, “Defeating steganographywith multibit sterilization using pixel eccentricity,” The IPSI BgDTransactions on Advanced Research, p. 25.

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 15

Zhili Zhou (M’19) received his MS and PhD de-grees in Computer Application at the School ofInformation Science and Engineering from HunanUniversity, in 2010 and 2014, respectively. He iscurrently a professor with School of Computer andSoftware, Nanjing University of Information Scienceand Technology, China. Also, he was a PostdoctoralFellow with the Department of Electrical and Com-puter Engineering, University of Windsor, Canada.His current research interests include MultimediaSecurity, In-formation Hiding, Digital Forensics,

Blockchain, and Secret Sharing. He is serving as an Associate Editor ofJournal of Real-Time Image Processing and Security and CommunicationNetworks.

Yuecheng Su is a graduate student. He is currentlypursuing his MS degree in Department of Computerand Software, Nanjing University of InformationScience and Technology, China. His current re-search interests include information hiding and dig-ital forensics.

Q. M. Jonathan Wu (M’92-SM’09) received thePh.D. degree in electrical engineering from theUniversity of Wales, Swansea, United Kingdom, in1990. He was affiliated with the National ResearchCouncil of Canada for ten years beginning, in 1995,where he became a senior research officer and agroup leader. He is currently a professor with theDepartment of Electrical and Computer Engineering,University of Windsor, Windsor, Ontario, Canada.He has published more than 300 peer-reviewed pa-pers in computer vision, image processing, intelli-

gent systems, robotics, and integrated microsystems. His current researchinterests include 3-D computer vision, active video object tracking andextraction, interactive multimedia, sensor analysis and fusion, and visualsensor networks. Dr. Wu held the Tier 1 Canada Research Chair in AutomotiveSensors and Information Systems since 2005. He is an associate editor forthe IEEE Trans-action on Cybernetics, the IEEE Transactions on Circuits andSystems for Video Technology, the Journal of Cognitive Computation, andthe Neurocomputing. He has served on technical program committees andinternational advisory committees for many prestigious conferences. He is aFellow of Canadian Academy of Engineering.

Zhangjie Fu received the Ph.D. degree in computerscience from the College of Computer, Hunan Uni-versity, China, in 2012. He is currently a professorin the School of Computer, Nanjing University ofInformation Science and Technology, China. Hisresearch interests include cloud and outsourcingsecurity, digital forensics, network and informationsecurity. His research has been supported by NSFC,PAPD, and GYHY. He is a member of the IEEE,and a member of ACM.

Yun-Qing Shi (Life Fellow, IEEE) received theM.S. degree from Shanghai Jiao Tong University,China, and the Ph.D. degree from the University ofPittsburgh, USA. In 1987, he joined the New JerseyInstitute of Technology, USA. He is the author orcoauthor of more than 400 articles, one book, fivebook chapters, and an editor of ten books. He holds30 U.S. patents. His research interests include datahiding, forensics, information assurance, and visualsignal processing and communications. He has beena member of a few IEEE technical committees and a

fellow of the National Academy of Inventors (NAI) since 2017. He has servedas the Technical Program Chair for the IEEE ICME07, the Co-Technical Chairfor the IEEE MMSP05, as well as IWDW since 2006, the Co-General Chairfor the IEEE MMSP02, and a Distinguished Lecturer for the IEEE CASS.He has served as an Associate Editor for the IEEE Transactions on SignalProcessing, the IEEE Transactions on Information Forensics and Security, andthe IEEE Transactions on Circuits and Systems and an editorial board memberfor few journals.