protecting big data privacy using randomized tensor

13
Protecting Big Data Privacy Using Randomized Tensor Network Decomposition and Dispersed Tensor Computation [Experiments, Analyses & Benchmarks] Jenn-Bing Ong Nanyang Technological University Singapore [email protected] Wee-Keong Ng Nanyang Technological University Singapore Ivan Tjuawinata Nanyang Technological University Singapore Chao Li RIKEN, Tokyo Japan Jielin Yang Nanyang Technological University Singapore Sai None Myne Singapore Management University Singapore Huaxiong Wang Nanyang Technological University Singapore Kwok-Yan Lam Nanyang Technological University Singapore C.-C. Jay Kuo University of Southern California United States of America ABSTRACT Data privacy is an important issue for organizations and enterprises to securely outsource data storage, sharing, and computation on clouds / fogs. However, data encryption is complicated in terms of the key management and distribution; existing secure computation techniques are expensive in terms of computational / communica- tion cost and therefore do not scale to big data computation. Tensor network decomposition and distributed tensor computation have been widely used in signal processing and machine learning for dimensionality reduction and large-scale optimization. However, the potential of distributed tensor networks for big data privacy preservation have not been considered before, this motivates the current study. Our primary intuition is that tensor network rep- resentations are mathematically non-unique, unlinkable, and un- interpretable; tensor network representations naturally support a range of multilinear operations for compressed and distributed / dispersed computation. Therefore, we propose randomized al- gorithms to decompose big data into randomized tensor network representations and analyze the privacy leakage for 1D to 3D data tensors. The randomness mainly comes from the complex structural information commonly found in big data; randomization is based on controlled perturbation applied to the tensor blocks prior to de- composition. The distributed tensor representations are dispersed on multiple clouds / fogs or servers / devices with metadata privacy, this provides both distributed trust and management to seamlessly secure big data storage, communication, sharing, and computation. Experiments show that the proposed randomization techniques are helpful for big data anonymization and e๏ฌƒcient for big data storage and computation. 1 INTRODUCTION Tensor decomposition, as a multi-dimensional generalization of ma- trix decomposition, is a multi-decades-old mathematical technique in multi-way data analysis since the 1960s, see [13] and references therein; tensor decompositions are widely applied in areas from signal processing such as blind source separation and multi-modal data fusion to machine learning such as model compression and learning latent variable models [3, 64]. Tensor computing recently emerges as a promising solution for big data processing due to its ability to model wide variety of data such as graphical, tabular, discrete, and continuous data [42, 66, 78]; algorithms to cater for di๏ฌ€erent data quality / veracity or missing data [65] and provide real-time analytics for big data velocity such as streaming analyt- ics [68, 69]; and able to capture the complex correlation structure in data with large volume and generate valuable insights for many big data distributed applications [12, 14]. Tensor network computing, on the other hand, is a well-established technique among the numer- ical community; the technique provides unprecedented large-scale scienti๏ฌc computing with performance comparable to competing techniques such as sparse-grid methods [37, 38]. Tensor network (TN) represents a data or tensor block in a sparsely-interconnected, low-order core tensors (typically 3 -order or 4 โ„Ž -order tensors) and the functions by distributed, multilinear tensor operations. TN was ๏ฌrst discovered in quantum physics in the 1990s to capture and model the multi-scale interactions between the entangled quantum particles in a parsimonious manner [53]. TN was then indepen- dently re-discovered in the 2000s by the numerical community and has found wide applications ranging from scienti๏ฌc computing to electronic design automation [28, 37, 38]. Big data generated from sensor networks or Internet-of-Things are essential for machine learning, in particular deep learning, in order to train cutting-edge intelligent systems for real-time decision making and precision analytics. However, big data may contain proprietary information or personal information such as location, health, emotion, and preference information of individuals which requires proper encryption and access control to protect usersโ€™ privacy. Symmetric and asymmetric key cryptosystems work by adding entropy / disorderliness into data using encryption algo- rithms and (pseudo-)random number generator so that unautho- rized users cannot ๏ฌnd pattern from the ciphertext and decipher them, however, higher computational cost is usually incurred with added functionality such as secure operations (addition / multiplica- tion) in homomorphic encryption and asymmetric key distribution arXiv:2101.04194v1 [cs.CR] 4 Jan 2021

Upload: others

Post on 06-Apr-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Protecting Big Data Privacy Using Randomized Tensor NetworkDecomposition and Dispersed Tensor Computation

[Experiments, Analyses & Benchmarks]Jenn-Bing Ong

Nanyang Technological UniversitySingapore

[email protected]

Wee-Keong NgNanyang Technological University

Singapore

Ivan TjuawinataNanyang Technological University

Singapore

Chao LiRIKEN, Tokyo

Japan

Jielin YangNanyang Technological University

Singapore

Sai None MyneSingapore Management University

Singapore

Huaxiong WangNanyang Technological University

Singapore

Kwok-Yan LamNanyang Technological University

Singapore

C.-C. Jay KuoUniversity of Southern California

United States of America

ABSTRACTData privacy is an important issue for organizations and enterprisesto securely outsource data storage, sharing, and computation onclouds / fogs. However, data encryption is complicated in terms ofthe key management and distribution; existing secure computationtechniques are expensive in terms of computational / communica-tion cost and therefore do not scale to big data computation. Tensornetwork decomposition and distributed tensor computation havebeen widely used in signal processing and machine learning fordimensionality reduction and large-scale optimization. However,the potential of distributed tensor networks for big data privacypreservation have not been considered before, this motivates thecurrent study. Our primary intuition is that tensor network rep-resentations are mathematically non-unique, unlinkable, and un-interpretable; tensor network representations naturally supporta range of multilinear operations for compressed and distributed/ dispersed computation. Therefore, we propose randomized al-gorithms to decompose big data into randomized tensor networkrepresentations and analyze the privacy leakage for 1D to 3D datatensors. The randomness mainly comes from the complex structuralinformation commonly found in big data; randomization is basedon controlled perturbation applied to the tensor blocks prior to de-composition. The distributed tensor representations are dispersedon multiple clouds / fogs or servers / devices with metadata privacy,this provides both distributed trust and management to seamlesslysecure big data storage, communication, sharing, and computation.Experiments show that the proposed randomization techniques arehelpful for big data anonymization and efficient for big data storageand computation.

1 INTRODUCTIONTensor decomposition, as a multi-dimensional generalization of ma-trix decomposition, is a multi-decades-old mathematical techniquein multi-way data analysis since the 1960s, see [13] and referencestherein; tensor decompositions are widely applied in areas fromsignal processing such as blind source separation and multi-modal

data fusion to machine learning such as model compression andlearning latent variable models [3, 64]. Tensor computing recentlyemerges as a promising solution for big data processing due toits ability to model wide variety of data such as graphical, tabular,discrete, and continuous data [42, 66, 78]; algorithms to cater fordifferent data quality / veracity or missing data [65] and providereal-time analytics for big data velocity such as streaming analyt-ics [68, 69]; and able to capture the complex correlation structure indata with large volume and generate valuable insights for many bigdata distributed applications [12, 14]. Tensor network computing,on the other hand, is a well-established technique among the numer-ical community; the technique provides unprecedented large-scalescientific computing with performance comparable to competingtechniques such as sparse-grid methods [37, 38]. Tensor network(TN) represents a data or tensor block in a sparsely-interconnected,low-order core tensors (typically 3๐‘Ÿ๐‘‘ -order or 4๐‘กโ„Ž-order tensors)and the functions by distributed, multilinear tensor operations. TNwas first discovered in quantum physics in the 1990s to capture andmodel the multi-scale interactions between the entangled quantumparticles in a parsimonious manner [53]. TN was then indepen-dently re-discovered in the 2000s by the numerical community andhas found wide applications ranging from scientific computing toelectronic design automation [28, 37, 38].

Big data generated from sensor networks or Internet-of-Thingsare essential for machine learning, in particular deep learning, inorder to train cutting-edge intelligent systems for real-time decisionmaking and precision analytics. However, big data may containproprietary information or personal information such as location,health, emotion, and preference information of individuals whichrequires proper encryption and access control to protect usersโ€™privacy. Symmetric and asymmetric key cryptosystems work byadding entropy / disorderliness into data using encryption algo-rithms and (pseudo-)random number generator so that unautho-rized users cannot find pattern from the ciphertext and decipherthem, however, higher computational cost is usually incurred withadded functionality such as secure operations (addition / multiplica-tion) in homomorphic encryption and asymmetric key distribution

arX

iv:2

101.

0419

4v1

[cs

.CR

] 4

Jan

202

1

Jenn-Bing Ong, Wee-Keong Ng, Ivan Tjuawinata, Chao Li, Jielin Yang, Sai None Myne, Huaxiong Wang, Kwok-Yan Lam, and C.-C. Jay Kuo

in public key encryption. The pain point of encryption nowadays iscomplicated key management and distribution especially when or-ganizations or enterprises are undergoing digital transformation tocomplex computing environments such as multi- / hybrid-cloud andmobile environments. The field of secure multi-party computation(SMPC) originates from Yao's garbled circuit in the 1980s where un-trusted parties jointly compute a function without disclosing theirprivate inputs [74]. SMPC has evolved and adopts distributed trustparadigm in recent years given the complex computing environ-ments, increasing attack surfaces, and recurring security breaches;the secret shares are now distributed among multiple computingnodes in order to be information-theoretically secure, i.e., secureagainst adversary with unbounded computational resources. SMPCcomputing primitives include secret sharing, garbled circuit, and ho-momorphic encryption, the supported secure operations are arith-metic, boolean, comparison, and bitwise operations; other securebuilding blocks that are routinely being used in SMPC are oblivioustransfer, commitment scheme, and zero-knowledge proof [15, 21].It is well-known that fully homomorphic encryption [25] suffersfrom high computational complexity, making it not practical tocompute complex functions during operational deployment; secretsharing and garbled circuit are expensive in terms of communica-tion complexity and therefore routinely operate with low-latencynetworks, furthermore, garbled circuit involves symmetric encryp-tion during the online phase. The communication complexity ofexisting SMPC protocols can incur runtime delay from an order ofmagnitude using local-area network (LAN) setting to several ordersusing wide-area network (WAN) setting.

The quest for scalability calls for innovative data security solu-tions which not only simplify privacymanagement, but also provideseamless integration between privacy-preserving big data storage /communication and computation / sharing. We believe this requiresintroducing a new secure computation primitive that is based ondistributed / dispersed tensor network computation. However, TNincreases the functionality and performance of multi-party com-putation at the expense of security. In contrast to classical encryp-tion and SMPC techniques which are based on modular arithmeticand works on fixed-point representations; TN naturally supportsboth floating-point and fixed-point arithmetics / operations. Fur-thermore, TN representations allow further compression unlikeencrypted computation techniques, which generally increase thestorage and communication overhead. Therefore, this generallymakes encrypted computation not scalable for big data processsing;whereas data compression prior to encryption usually makes thedata representations lose some functionalities such as encryptedcomputation on the original data. With the impressive track recordsof distributed TNs in large-scale scientific computing and big dataanalytics, we propose a novel secret-sharing scheme based on ten-sor networks and investigate its feasibility for privacy-preservingbig data distributed applications. Our contributions are as follows:

โ€ข Propose an arithmetic secret-sharing scheme based on ran-domized tensor network decomposition and dispersed tensormultilinear operations. The randomization is done by con-trolled perturbation applied to the data blocks prior to singu-lar value decomposition (SVD), which results in randomized

tensor blocks after decomposition due to the complex struc-tural information in big data. The perturbation technique canbe easily adapted in various TN decomposition algorithmsto generate randomized TN representations.โ€ข Empirically analyze the privacy leakage of the randomizedTN representations for 1D to 3D datasets and propose miti-gation techniques to reduce the privacy leakage. The datacompressibility and algorithmic efficiency of the proposedrandomized TN algorithms have also been investigated.

The organization of this work is as follows: Section 2 covers relatedwork on state-of-the-art privacy-preserving techniques and securetensor decompositions. Sections 3 and 4 explain the security modeland our proposed randomized tensor dispersed computing approachfor big data privacy preservation. Section 5 conducts experimentalstudies to benchmark the security, efficiency, and performanceof the proposed approach. Section 6 discusses the implications,limitations, and potential extension of this research study.

2 RELATEDWORKSecret-Sharing schemes provide information-theoretical securityat the expense of high storage and communication cost. Here, wereview practical secret-sharing schemes for big data protection thatprovide only computational security but offer high storage / compu-tational efficiency. Krawczyk [40] proposes the first computationalsecret sharing scheme by encrypting the data using symmetricencryption with randomly-generated key, the encrypted data isdivided into multiple blocks using Rabinโ€™s information dispersalalgorithm; whereas the encryption / decryption key is split usingShamirโ€™s secret-sharing scheme such that collecting a certain thresh-old number of blocks is enough for secret reconstruction. Sincethen, many variants of the computational secret-sharing schemehave been proposed to improve the data security, data redundancy/ error resistance, performance, data integrity / authentication, frag-ment size, data deduplication, and location management with dif-ferent machine trustworthiness [32, 33, 50]. Most notably, the keyexposure problem is a practical issue to address due to usage ofweak key for encryption, key reuse, or key leakage. The All-Or-Nothing Transform (AONT) introduced by Rivest [58] solves thekey exposure problem by building dependency between the frag-ments such that acquiring only the key without all the fragmentswill not lead to immediate information leakage, a recent reviewon AONT can be found in [57]. Furthermore, access revocation isgreatly simplified by re-encrypting only one data fragment with afresh encryption key, which significantly reduces the transmissioncost [34, 35]. However, utility of such encrypted data is quite lim-ited such as search, update, and computation cannot be performedwithout reconstructing the original data [22, 75].

Database Fragmentation or Data Splitting [18] aim to providefunctionality-preserving data protection for data storage on clouds.Sensitive data is fragmented in clear form in separate storage lo-cations such that each data fragment does not reveal confidentialinformation linked to a subject. Data splitting can be done at byte,semantic, or attribute level. Byte-level fragmentation splits the sen-sitive files and performs shifting and recombination of the bytes toform fixed data blocks before storing on different cloud locations,this is particularly suitable for binary or multimedia files, which are

Protecting Big Data Privacy Using Randomized Tensor Network Decomposition and Dispersed Tensor Computation [Experiments, Analyses & Benchmarks]

usually stored but not processed by cloud. Semantically-groundedsplitting mechanism is well-suited for unstructured data such astextual data, it can provide keyword search for online document,email, and messaging applications. For example, a recent workby [59] automatically detects and splits the sets of textual entitiesthat may disclose sensitive information by analysing the seman-tics they convey and their semantic dependencies. Attribute-levelsplitting such as vertical splitting [2] is very useful for statisticaldatabases because usually is the combination of several risky at-tributes that may lead to personal re-identification. Computationon attributes stored on single fragment in vertial splitting is fast andstraightforward, e.g., addition, updating, and uni-valued statisticssuch as mean and variance. However, data splitting requires a proxyserver to manage the locations, queries, and operations on the datafragments, this becomes the single point of failure if users cannotaccess the metadata stored at the proxy.

Data Anonymization is perhaps the simplest low-cost solutionthat is widely adopted nowadays for secure data sharing withinand across enterprises for diverse applications, including machinelearning. Data anonymization techniques cover both the removalof personally-idenfiable information (e.g., using hashing or mask-ing techniques) and data randomization / perturbation techniques(e.g., random noise, permutation, transformation) [18]. The randomcomponents or functions have to be carefully designed to preserveimportant information in the training dataset and ensure model per-formance. A recent systematic survey of different privacy metricsthat have been proposed over the years can be found in [72]. Theseprivacy metrics are based on information theory, data similarity,indistinguishability measures, and adversaryโ€™s success probability;to choose a suitable privacy metric for a particular setting dependson the adversarial model, data sources, information available tocompute the metric and the properties to measure [72]. Differentialprivacy (DP) [19, 20] is a mathematical framework to rigorouslyquantify the amount of information leaked during operations ona statistical database or machine learning [1, 5, 11, 60, 63], DP is aproven privacy-preserving technique widely adopted by the indus-try. A recent promising data anonymization approach is to generatesynthetic data [52] that resembles the statistical distribution or be-havior observed in the original datasets using generative machinelearning models such as generative adversarial networks [26] andcomputer simulations (e.g., [46]), however, these models / simula-tions are application-specific (i.e., depend on the training datasetor physical models) and any analysis on the synthetic data has tobe verified over the real dataset for validation.

Although privacy-preserving matrix and tensor decompositiontechniques have been well studied in the literature [8, 23, 24, 31,39, 41, 45, 47, 51, 73, 76], distributed / dispersed TN representa-tions and computation have not been proposed for big data privacypreservation, which motivates the current study. Different fromdata anonymization techniques, tensor decompositions are fullyreversible and compressible, the reconstruction accuracy can beeither lossy or near-lossless [16]. Unlike data splitting, TN doesnot require proxy server to manage the metadata, but offers muchbetter utility of the decomposed data at the expense of higher pri-vacy leakage compared to computational secret-sharing schemes.To process big data, randomized mapping or projection techniques

utilize a projection matrix such as Gaussian, Rademacher, and ran-dom orthonormal matrices [10, 70] to project the data tensor tomuch smaller tensor size before applying tensor decompositions.Randomized sampling techniques such as fiber subset selection ortensor cross approximation choose a small subset of tensor fibersthat approximate the entire data tensor well, e.g., measured us-ing quasi-optimal maximal volume or modulus determinant of thesubmatrix so that the matrix cross-approximation is close to theoptimal SVD solution [48, 54]. Existing randomized mapping / pro-jection and randomized sampling algorithms are useful for big datatensor decompositions to fit the data size into existing memory re-quirements, the decomposed tensor blocks are usually compressedwith lossy reconstruction accuracy, which is different from ourproposed randomized tensor decompositions. The randomness ofthe decomposed tensor blocks is also limited by the distribution ofthe projection matrix and sampling process to ensure small errorbounds, whereas our proposed algorithms randomly disperse thecomplex structural information of big data into the tensor cores byapplying large-but-controlled perturbations during the sequentialmatrix decomposition process in tensor decomposition algorithms.The time complexity is also much lower compared to randomizedprojection / mapping algorithms and can be easily adapted intoexisting TN algorithms. Nonetheless, the proposed tensor perturba-tion techniques can be easily combined with existing randomizedprojection / sampling algorithms for big data processing and pri-vacy protection. Tensor decompositions have been widely used fordimensionality reduction of big data, however, research on tensornetwork coding schemes are lagging behind, only a few publicationsare found at the time of writing [4, 16, 36].

3 THREAT MODEL AND SECURITYThe secure storage and computation by a client are outsourced to aset of untrusted but non-colluding servers ๐‘†1, ๐‘†2, ..., ๐‘†๐‘› , the client se-cret share their inputs among the servers in the initial setup phase,the servers then proceed to securely store (e.g., with encryption),compute and communicate using dispersed TN computation pro-tocols. The servers run on different software stacks to minimizethe chance that they all become vulnerable to the exploit avail-able to malware attacks and can be operated under different sub-organizations to minimize insider threats. Given the cloud scenario,the secret shares can be distributed to multiple virtual instancesprovided by the same cloud service provider (CSP) or to differentclouds (e.g., multi-cloud or hybrid-cloud environments). We as-sume a semi-honest adversary A (or so-called honest-but-curiousadversary) who is able to corrupt any subset of the clients and atmost ๐‘› โˆ’ 1 servers at any point of time. Different from encrypteddata processing, our security definition requires an adversary tolearn only partial information of the clientโ€™s input but not knowingthe sensitive information from the process. The privacy leakageis measured based on information-theoretic and similarity-basedprivacy metrics. Secret-sharing scheme based on TN is asymmetricto each server, i.e., each server contains index-specific information.As shown in Sections 4 and 5, each of the TN representations re-quires high data complexity (or high tensor-rank complexity) to beprivacy-preserving in multi-party setting.

Jenn-Bing Ong, Wee-Keong Ng, Ivan Tjuawinata, Chao Li, Jielin Yang, Sai None Myne, Huaxiong Wang, Kwok-Yan Lam, and C.-C. Jay Kuo

4 SECRET-SHARING SCHEME BASED ONDISTRIBUTED TENSOR NETWORKS

In this section, we propose a novel secret-sharing scheme basedon dispersed TN representations / operations to seamlessly securebig data storage, communication, sharing, and computation. TNdecomposes data chunk at the semantic level, each of the decom-posed tensor block which contains latent information are randomlydistributed among multiple non-colluding servers. The success ofmulti-way component analysis can be attributed to the existenceof efficient algorithms for matrix and tensor decomposition andthe possibility to extract components with physical meaning byimposing constraints such as sparsity, orthogonality, smoothness,and non-negativity [12]. Our primary intuition is that higher-ordertensor decompositions are in general non-unique, each tensor coreor factor matrix contains index-specific information which areunlinkable and uninterpretable due to non-uniqueness of the de-compositions, therefore they are commonly used for dimensionalityreduction and compressed computation.

Several basic tensor models are described here within the multi-party computation setting to enhance the privacy protection of theoriginal tensor. Here, we propose randomized algorithm based onperturbation technique to decompose each data chunk into random-ized tensor blocks, each of the tensor blocks can be re-randomizedagain using tensor-rounding algorithm after performing tensordistributed, multilinear operations to reduce the tensor-rank com-plexity for storage and computational efficiency.

Tucker decomposition (TD) [56] is a natural extension of matrixSingular Value Decomposition (SVD) into high-dimensional tensor.TD captures the interactions between the latent factors U (fromSVD of mode-n matricization of a tensor) using a core tensor Gthat reflects and ranks the major subspace variations in each modeof the original tensor. For a third-order tensor A โˆˆ R๐ผ1ร—๐ผ2ร—๐ผ3 , TDcan be defined as follows using different tensor operations:

A(๐‘–1, ๐‘–2, ๐‘–3) ๏ฟฝ G ร—1 โŸจU1โŸฉ1 ร—2 โŸจU2โŸฉ2 ร—3 โŸจU3โŸฉ3๐‘ฃ๐‘’๐‘ (A) ๏ฟฝ (โŸจU3โŸฉ3 โŠ— โŸจU2โŸฉ2 โŠ— โŸจU1โŸฉ1) ๐‘ฃ๐‘’๐‘ (G)

(1)

G โˆˆ R๐‘…1ร—๐‘…2ร—๐‘…3 is a 3-dimensional core tensor, U๐‘˜ โˆˆ R๐ผ๐‘˜ร—๐‘…๐‘˜ ,๐‘˜ โˆˆ {1, 2, 3} are the factor matrices, ร—๐‘› is the n-mode product,โŠ— is the Kronecker product, ๐‘ฃ๐‘’๐‘ (ยท) is the vectorization operator(see the definitions in [56]), โŸจยทโŸฉโ„“ denotes the private share stored inserver โ„“ . Here, G is a shared core for exchange between servers toperform tensor computation schemes. TD is non-unique becausethe latent factors can be rotated without affecting the reconstruc-tion error, however, TD yields a good low-rank approximation of atensor in terms of squared error. Canonical Polyadic (CP) decompo-sition is a special case of TD when G is superdiagonal. CP is verypopular in signal processing due to its uniqueness guerantee andease of interpretation [13], however, these properties also make CPunsuitable for privacy preservation.

Hierarchical Tucker (HT) decomposition [27, 30] was proposedto reduce the memory requirements of TD. HT approximates wellhigher-order tensors (๐‘ >> 3) without suffering from the curse ofdimensionality. HT recursively splits the modes of a tensor basedon a binary tree hierarchy such that each node contains a subset ofthe modes. Therefore, HT requires a priori knowledge of a binary

tree of matricizations of the tensor, HT is defined as follows:U๐‘ก ๏ฟฝ (U๐‘ก๐‘™ โŠ— U๐‘ก๐‘Ÿ ) โŸจB๐‘ก โŸฉ๐‘ก (2)

B๐‘ก are the โ€œtransfer" core tensors (or internal nodes) reshapedinto ๐‘…๐‘ก๐‘™๐‘…๐‘ก๐‘Ÿ ร— ๐‘…๐‘ก matrix, U๐‘ก contains the ๐‘…๐‘ก left singular vectorsof the original tensor, ๐‘ก๐‘™ and ๐‘ก๐‘Ÿ correspond to the left and rightchild nodes respectively. The leaf nodes โŸจU1โŸฉ1, โŸจU2โŸฉ2, . . . , โŸจU๐‘ โŸฉ๐‘contain the latent factors and should be stored distributedly toensure privacy preservation. HT is particularly useful when theapplication provides an intuitive and natural hierarchy over thephysical modes.

Tensor-Train (TT) [55] decomposes a given tensor into a series orcascade of connected core tensors, therefore TT can be interpretedas a special case of HT. TT core tensors are connected through acommon reduced mode or TT-rank, ๐‘…๐‘˜ . TT is defined as follows:

A(๐‘–1, ๐‘–2, ๐‘–3) ๏ฟฝ โŸจG[๐‘–1]โŸฉ1 ร— โŸจG[๐‘–2]โŸฉ2 ร— โŸจG[๐‘–3]โŸฉ3 (3)where G[๐‘–๐‘˜ ] is a ๐‘…๐‘˜โˆ’1 ร— ๐‘…๐‘˜ matrix with ๐‘…0 = ๐‘…3 = 1, and ร— isthe matrix multiplication operation. TT format and its variants arevery useful owing to their flexibilty for a number of distributed,multilinear operations [43] and the possibility to convert otherbasic tensor models (e.g., CP, TD, HT) into TT format [12]. Similarproperties apply to tensor chain or tensor ring format (TR) [77],which is a linear combination of TT formats, i.e., ๐‘…1 = ๐‘…3 > 1. TRrepresentations are more generalized and powerful compared toTT representations [77]; whereas extended TT further decomposesthe TT-cores into smaller blocks [29].

Storage Complexity. Table 1 tabulates the storage complexity andbound of the different TN formats mentioned here. Low-rank ap-proximation is very useful in tensor network computing for savingstorage, communication, and computational cost with negligibleloss in accuracy for some highly-correlated tensor data structuresor functional forms that admit low-rank structure. Tucker format isnot practical for tensor order ๐‘ > 5 because the number of entriesof the core tensor G scales exponentially with ๐‘ , therefore storageand computing in Tucker format are not practical when dealingwith higher-order tensors [12]. TT format and its variants exhibitboth stable numerical properties and reasonable storage complexity.Furthermore, TT allows control of the approximation error withinthe TT decomposition and TT-rounding algorithms.

Table 1: Storage complexity of different tensor formats [12].The storage bound is calculated by letting ๐ผ = max๐‘˜ ๐ผ๐‘˜ , ๐‘… =

max๐‘˜ ๐‘…๐‘˜ , ๐‘˜ โˆˆ {1, 2, . . . , ๐‘ }. HT, TT, and TR are powerful repre-sentations that break the curse of big data dimensionality.

TN Storage Complexity Storage BoundCP

โˆ‘๐‘๐‘˜=1 ๐ผ๐‘˜๐‘… ๐‘‚ (๐‘๐ผ๐‘…)

TDโˆ‘๐‘๐‘˜=1 ๐ผ๐‘˜๐‘…๐‘˜ +

โˆ๐‘๐‘˜=1 ๐‘…๐‘˜ ๐‘‚ (๐‘๐ผ๐‘… + ๐‘…๐‘ )

HTโˆ‘๐‘๐‘˜=1 ๐ผ๐‘˜๐‘…๐‘˜ +

โˆ‘(๐‘ข,๐‘ฃ,๐‘ก ) ๐‘…๐‘ข๐‘…๐‘ฃ๐‘…๐‘ก ๐‘‚ (๐‘๐ผ๐‘… + ๐‘๐‘…3)

TT / TRโˆ‘๐‘๐‘˜=1 ๐ผ๐‘˜๐‘…๐‘˜โˆ’1๐‘…๐‘˜ ๐‘‚ (๐‘๐ผ๐‘…2)

Graphical Representations. TNs can be represented by a set ofnodes interconnected by the edges. The edges correspond to thecontracted modes, whereas lines that do not go from one tensorto another correspond to open (physical) modes, which contribute

Protecting Big Data Privacy Using Randomized Tensor Network Decomposition and Dispersed Tensor Computation [Experiments, Analyses & Benchmarks]

Figure 1: Graphical representations of different tensor net-work (TN) decompositions. The number of lines connectedto a node shows the tensor order; the rank andmode size arelabeled on the edges. (a) Canonical Polyadic (CP) decomposi-tion, (b) Tucker decomposition (TD), (c) Hierarchical Tucker(HT), (d) Tensor-Train (TT), and (e) Tensor-Ring (TR) decom-position. A TN can be partitioned into secret shares at in-dividual node level or into sets of tensor nodes distributedacross servers for privacy preservation.

to the total number of orders of the entire TN. Fig. 1 shows thegraphical representations of different TN representations. Mathe-matical operations performed on tensor (e.g., tensor contractionsand reshaping) can be expressed using graphical representation oftensors in a simple and intuitive way without the explicit use ofcomplex mathematical expressions.

Shares Generation based on Randomized Tensor Decompositions.Algorithms 1, 2, 3, and 4 present our proposed randomized rTD,rHT, rTT-SVD, rTR-SVD algorithms that decompose N-dimensionaltensor into randomized secret shares by applying perturbationsto randomly disperse the structural information in big data intothe tensor cores. Algorithm 1 is based on Higher-Order SingularValue Decomposition (HOSVD) proposed in [7], HOSVD performsSVD on each mode of a tensor to extract the latent factors beforeobtaining the core tensor that captures the complex interactionsbetween the latent factors. Algorithm 2 recursively applies rTD oneach tensor node based on a binary tree matricizations of the inputtensor. Algorithm 3 and 4 are based on the TT-SVD and TR-SVDalgorithms proposed in [55] and [77] respectively, TT-SVD and TR-SVD perform sequential SVD decomposition on a tensor to obtainthe TT and TR representations. Figures 2 and 3 show the graphicalrepresentations of the proposed rTD and rTT-SVD algorithms. Therandomized dispersion is applied after performing each SVD stepin Algorithms 1, 2, and 3. To balance between compression andrandomness, the maximum (randomized) perturbation should bewithin certain threshold ๐›ฟ based on the magnitude of each singularvalue, and +ve/-ve sign difference from each singular vector. Theshare re-generation can be done with our proposed randomizedTT-rounding algorithm based on [55] (see Algorithm 5) all carriedout in TT format on distributed servers, however this is not recom-mended because computation with TN may leak private informa-tion (gradually) to the servers. The proposed secret-sharing schemeis asymmetric to the servers, each server stores only index-specificinformation based on the tensor core it receives. The perturbations

are embedded inside existing tensor decomposition algorithms,therefore the computational complexity does not increase much,only a few more tensor core contractions (i.e., to apply perturba-tion and randomize 1๐‘ ๐‘ก core / factor) and an SVD are performed.The memory size to store the perturbation factors is considerednegligible.

Algorithm 1: Proposed randomized Tucker Decomposi-tion (rTD) based on Higher-Order SVD (HOSVD) [7].Input :Tensor A โˆˆ R๐ผ1ร—๐ผ2ร—...ร—๐ผ๐‘ and ranks ๐‘…1, ๐‘…2, . . . , ๐‘…๐‘ .Output :Tucker core G โˆˆ R๐‘…1,๐‘…2,...,๐‘…๐‘ and factor matrices

U๐‘˜ โˆˆ R๐ผ๐‘˜ร—๐‘…๐‘˜ ๐‘  .๐‘ก . A ๏ฟฝ G ร—1 U1 ร—2 U2 . . . ร—๐‘ U๐‘ .Initialization: G1 = A;Modified from multilinear SVD or ๐‘ -mode SVD:for ๐‘˜ = 1 to ๐‘ do[U๐‘˜ , S๐‘˜ ,V๐‘˜ ] โ† ๐‘ก๐‘†๐‘‰๐ท (G๐‘˜ (๐‘˜) , ๐‘…๐‘ก๐‘Ÿ๐‘ข๐‘›๐‘. = ๐‘…๐‘˜ );Generate diagonal perturbation matrix with uniformdistribution bet. threshold ๐›ฟ and 1, ฮ”ฮ”ฮ”๐‘˜ โˆผ U([๐›ฟ, 1]);Perturb the core tensor, G๐‘˜+1 โ† G๐‘˜ ร—๐‘˜ (U๐‘‡

๐‘˜ฮ”ฮ”ฮ”๐‘˜ );

Update the factor matrix, U๐‘˜ โ† ฮ”ฮ”ฮ”โˆ’1๐‘˜U๐‘˜ ;

endRandomize the 1๐‘ ๐‘ก TD factor matrix:G โ† G๐‘+1; G โ† G ร—1 U1;[U1, S1,V1] โ† ๐‘ก๐‘†๐‘‰๐ท (G(1) , ๐‘…๐‘ก๐‘Ÿ๐‘ข๐‘›๐‘. = ๐‘…1);ฮ”ฮ”ฮ”1 โˆผ U([๐›ฟ, 1]); G โ† G ร—1 (U๐‘‡

1ฮ”ฮ”ฮ”1);U1 โ† ฮ”ฮ”ฮ”โˆ’11 U1;

Algorithm 2: Proposed randomized Hierarchical Tucker(rHT) decomposition by recursive node-wise rTD (Algo. 1).Input :Tensor A โˆˆ R๐ผ1ร—๐ผ2ร—...ร—๐ผ๐‘ , ranks ๐‘…1, ๐‘…2, . . . , ๐‘…๐‘ ,

and binary tree T of the matricizations of A.Output :HT factor matrices U1, U2, . . . , U๐‘ and transfer

cores B๐‘ก , ๐‘ก โˆˆ nonleaf nodes of binary tree T .U1 โ† A(1) ;Starting from the root node of tree T , select a node ๐‘ก :

Set ๐‘ก๐‘™ and ๐‘ก๐‘Ÿ to be the left and right child of ๐‘ก resp.;If ๐‘ก๐‘™ is not singleton: ๐‘…๐‘ก๐‘™ โ† ๐‘…๐‘“ ๐‘ข๐‘™๐‘™ ;If ๐‘ก๐‘Ÿ is not singleton: ๐‘…๐‘ก๐‘Ÿ โ† ๐‘…๐‘“ ๐‘ข๐‘™๐‘™ ;U๐‘ก โ† ๐‘Ÿ๐‘’๐‘ โ„Ž๐‘Ž๐‘๐‘’ (U๐‘ก , [๐‘ก๐‘™ , ๐‘ก๐‘Ÿ , ๐‘ก]);[B๐‘ก ,U๐‘ก๐‘™ ,U๐‘ก๐‘Ÿ ] โ† ๐‘Ÿ๐‘‡๐ท (U๐‘ก , ๐‘…๐‘ก๐‘Ÿ๐‘ข๐‘›๐‘. = [๐‘…๐‘ก๐‘™ , ๐‘…๐‘ก๐‘Ÿ ]);If ๐‘ก๐‘™ is a singleton: U๐‘ก๐‘™ โ† U๐‘ก๐‘™ ;If ๐‘ก๐‘Ÿ is a singleton: U๐‘ก๐‘Ÿ โ† U๐‘ก๐‘Ÿ ;Recurse on ๐‘ก๐‘™ and ๐‘ก๐‘Ÿ until ๐‘ก๐‘™ and ๐‘ก๐‘Ÿ are singletons.

Privacy and Correctness. The correctness of secret sharing basedon randomized TN formats is obvious; tensor representations arecompressible if the data admits low-rank structure. The proposedrandomized tensor decomposition algorithms simply split the com-plex structural information in big data randomly into differenttensor cores or sub-blocks. The sensitivity of SVD decompositionsubject to small perturbations is well-known for complex corre-lation structure, i.e., when the singular values are closely sepa-rated [44, 67]. Moreover, the proposed algorithms randomize TN

Jenn-Bing Ong, Wee-Keong Ng, Ivan Tjuawinata, Chao Li, Jielin Yang, Sai None Myne, Huaxiong Wang, Kwok-Yan Lam, and C.-C. Jay Kuo

Figure 2: Graphical representation of the proposed rTD algo-rithm for a 3๐‘Ÿ๐‘‘ -order tensor, see Algorithm 1 for the details.

Figure 3: Graphical representation of the proposed rTT-SVDfor a 3๐‘Ÿ๐‘‘ -order tensor, see Algorithm 3 for the details.

decompositions by large-but-controlled perturbation that does notaffect the reconstruction accuracy. The privacy leakage is limitedby the tensor-rank complexity of each index, i.e., index that has

Algorithm 3: Proposed randomized Tensor Train-SingularValue Decomposition (rTT-SVD) algorithm based on [55].Input :Tensor A โˆˆ R๐ผ1ร—๐ผ2ร—...ร—๐ผ๐‘ and error threshold ๐œ– .Output :TT cores A = G1 ยท G2 ยท ยท ยท G๐‘โˆ’1 ยท G๐‘ such that the

approximation error โˆฅA โˆ’ Aโˆฅ๐น โ‰ฒ ๐œ– โˆฅAโˆฅ๐น .Initialization: TT-rank ๐‘…0 = 1; Perturbation threshold ๐›ฟ ;

Truncation parameter ๐›ฟ๐œ– = ๐œ–โˆš๐‘โˆ’1

;Tensor shape, [๐ผ1, ๐ผ2, . . . , ๐ผ๐‘ ] โ† ๐‘ โ„Ž๐‘Ž๐‘๐‘’ (A);Mode-1 matricization of tensor A,M1 โ† A(1) ;Sequential (SVD + randomized dispersion):for ๐‘˜ = 1 to ๐‘ โˆ’ 1 do

Truncated SVD, [U๐‘˜ , S๐‘˜ ,V๐‘˜ ] โ† ๐‘ก๐‘†๐‘‰๐ท (M๐‘˜ , ๐›ฟ๐‘Ÿ๐‘’๐‘™ . = ๐›ฟ๐œ– );TT-rank, ๐‘…๐‘˜ โ† ๐‘ โ„Ž๐‘Ž๐‘๐‘’ (S๐‘˜ , 1);Generate diagonal perturbation matrix with uniformdist. between threshold ๐›ฟ and 1, ฮ”ฮ”ฮ”๐‘˜ โˆผ U([๐›ฟ, 1]);Reshape the orthogonal matrix U๐‘˜ divided by theperturbation factor ฮ”ฮ”ฮ”๐‘˜ into a third-order tensorG๐‘˜ โ† ๐‘Ÿ๐‘’๐‘ โ„Ž๐‘Ž๐‘๐‘’ (U๐‘˜ฮ”ฮ”ฮ”

โˆ’1๐‘˜, [๐‘…๐‘˜โˆ’1, ๐ผ๐‘˜ , ๐‘…๐‘˜ ]);

Matricize S๐‘˜V๐‘‡๐‘˜multiplied by the perturbation factor ฮ”ฮ”ฮ”๐‘˜

M๐‘˜+1 โ† ๐‘Ÿ๐‘’๐‘ โ„Ž๐‘Ž๐‘๐‘’ (ฮ”ฮ”ฮ”๐‘˜S๐‘˜V๐‘‡๐‘˜, [๐‘…๐‘˜ ๐ผ๐‘˜+1,

โˆ๐‘๐‘=๐‘˜+2 ๐ผ๐‘ ]);

endConstruct the last core, G๐‘ โ† ๐‘Ÿ๐‘’๐‘ โ„Ž๐‘Ž๐‘๐‘’ (M๐‘ , [๐‘…๐‘โˆ’1, ๐ผ๐‘ ]);Randomize the 1๐‘ ๐‘ก TT-core:

G1 โ† ๐‘Ÿ๐‘’๐‘ โ„Ž๐‘Ž๐‘๐‘’ (G1, [๐ผ1, ๐‘…1]);G2 โ† ๐‘Ÿ๐‘’๐‘ โ„Ž๐‘Ž๐‘๐‘’ (G2, [๐‘…1, ๐ผ2๐‘…2]);[U1, S1,V1] โ† ๐‘ก๐‘†๐‘‰๐ท (G1G2, ๐›ฟ๐‘Ÿ๐‘’๐‘™ . = ๐›ฟ๐œ– );๐‘…1 โ† ๐‘ โ„Ž๐‘Ž๐‘๐‘’ (S1, 1); ฮ”ฮ”ฮ”1 โˆผ U([๐›ฟ, 1]);G1 โ† ๐‘Ÿ๐‘’๐‘ โ„Ž๐‘Ž๐‘๐‘’ (U1ฮ”ฮ”ฮ”โˆ’11 , [๐ผ1, ๐‘…1]);G2 โ† ๐‘Ÿ๐‘’๐‘ โ„Ž๐‘Ž๐‘๐‘’ (ฮ”ฮ”ฮ”1S1V๐‘‡

1 , [๐‘…1, ๐ผ2, ๐‘…2]);

sufficiently high rank complexity is privacy-preserving, whereasindex that has only zeroes in the tensor cores implies that all thevalues that correspond to this index in the original tensor are zero.However, this can be easily overcome by padding the original tensorwith random noise to increase the complexity before TN decomposi-tion. With sufficiently high tensor-rank complexity, the magnitude,sign, and exact position of non-zero values are not leaked evenwith collusion by all-except-one servers. To further increase theuncertainty, we randomly permute the mode variables along eachtensor dimension and store the random seeds for reconstruction.Random permutations can be performed after (block-wise) TN de-composition to ensure compressibility if the multi-dimensionaldata is highly-correlated. Each tensor core contains only index-specific information and therefore they are unlinkable in the eventof massive data breach. The partition of more sophisticated TNstructures into private and shared tensor cores can be done withhierarchical clustering based on pairwise network distance andrandomized, dispersed tensor computation that minimize privacyleakage, communication, and computational cost.

Relationship with Additive Secret-Sharing Scheme. The classicaladditive secret-sharing scheme is defined as ๐‘ฅ = โŸจ๐‘ฅ1โŸฉ1 + โŸจ๐‘ฅ2โŸฉ2 + . . ..The conversion from the classical scheme to secret-sharing schemebased on TN format is relatively straightforward, each party de-composes their individual share using the proposed randomized

Protecting Big Data Privacy Using Randomized Tensor Network Decomposition and Dispersed Tensor Computation [Experiments, Analyses & Benchmarks]

Algorithm 4: Proposed randomized Tensor Ring-SingularValue Decomposition (rTR-SVD) based on [77].Input :Tensor A โˆˆ R๐ผ1ร—๐ผ2ร—...ร—๐ผ๐‘ and error threshold ๐œ– .Output :TR cores A = G1 ยท G2 ยท ยท ยท G๐‘ such that the

approximation error โˆฅA โˆ’ Aโˆฅ๐น โ‰ฒ ๐œ– โˆฅAโˆฅ๐น .Initialization: Perturbation threshold ๐›ฟ ;

Truncation parameter ๐›ฟ๐‘˜ =

โˆš2๐œ–โˆš๐‘, ๐‘˜ = 1

๐œ–โˆš๐‘, ๐‘˜ > 1

;

Prepare the 1๐‘ ๐‘ก TR core:Tensor shape, [๐ผ1, ๐ผ2, . . . , ๐ผ๐‘ ] โ† ๐‘ โ„Ž๐‘Ž๐‘๐‘’ (A);Mode-1 matricization of tensor A,M1 โ† A (1) ;Truncated SVD, [U1, S1,V1] โ† ๐‘ก๐‘†๐‘‰๐ท (M1, ๐›ฟ๐‘Ÿ๐‘’๐‘™ . = ๐›ฟ1);ฮ”ฮ”ฮ”1 โˆผ U([๐›ฟ, 1]); Split TT-ranks ๐‘…0, ๐‘…1:min๐‘…0,๐‘…1 | |๐‘…0 โˆ’ ๐‘…1 | | s.t. ๐‘…0๐‘…1 = ๐‘ โ„Ž๐‘Ž๐‘๐‘’ (S1, 1);

Set TT-rank, ๐‘…๐‘ โ† ๐‘…0;G1 โ† ๐‘Ÿ๐‘’๐‘ โ„Ž๐‘Ž๐‘๐‘’ (U1ฮ”ฮ”ฮ”โˆ’11 , [๐‘…0, ๐ผ1, ๐‘…1]);M2 โ† ๐‘Ÿ๐‘’๐‘ โ„Ž๐‘Ž๐‘๐‘’ (ฮ”ฮ”ฮ”1S1V๐‘‡

1 , [๐‘…1๐ผ2,โˆ๐‘

๐‘=3 ๐ผ๐‘๐‘…๐‘ ]);Sequential (SVD + randomized dispersion):for ๐‘˜ = 2 to ๐‘ โˆ’ 1 do[U๐‘˜ , S๐‘˜ ,V๐‘˜ ] โ† ๐‘ก๐‘†๐‘‰๐ท (M๐‘˜ , ๐›ฟ๐‘Ÿ๐‘’๐‘™ . = ๐›ฟ๐‘˜ );๐‘…๐‘˜ โ† ๐‘ โ„Ž๐‘Ž๐‘๐‘’ (S๐‘˜ , 1); ฮ”ฮ”ฮ”๐‘˜ โˆผ U([๐›ฟ, 1]);G๐‘˜ โ† ๐‘Ÿ๐‘’๐‘ โ„Ž๐‘Ž๐‘๐‘’ (U๐‘˜ฮ”ฮ”ฮ”

โˆ’1๐‘˜, [๐‘…๐‘˜โˆ’1, ๐ผ๐‘˜ , ๐‘…๐‘˜ ]);

M๐‘˜+1 โ† ๐‘Ÿ๐‘’๐‘ โ„Ž๐‘Ž๐‘๐‘’ (ฮ”ฮ”ฮ”๐‘˜S๐‘˜V๐‘‡๐‘˜, [๐‘…๐‘˜ ๐ผ๐‘˜+1,

โˆ๐‘๐‘=๐‘˜+2 ๐ผ๐‘๐‘…๐‘ ]);

endConstruct the last core,G๐‘ โ† ๐‘Ÿ๐‘’๐‘ โ„Ž๐‘Ž๐‘๐‘’ (M๐‘ , [๐‘…๐‘โˆ’1, ๐ผ๐‘ , ๐‘…๐‘ ]);

Randomize the 1๐‘ ๐‘ก TR core:G1 โ† ๐‘Ÿ๐‘’๐‘ โ„Ž๐‘Ž๐‘๐‘’ (G1, [๐‘…0๐ผ1, ๐‘…1]);G2 โ† ๐‘Ÿ๐‘’๐‘ โ„Ž๐‘Ž๐‘๐‘’ (G2, [๐‘…1, ๐ผ2๐‘…2]);[U1, S1,V1] โ† ๐‘ก๐‘†๐‘‰๐ท (G1G2, ๐›ฟ๐‘Ÿ๐‘’๐‘™ . = ๐›ฟ1);๐‘…1 โ† ๐‘ โ„Ž๐‘Ž๐‘๐‘’ (S1, 1); ฮ”ฮ”ฮ”1 โˆผ U([๐›ฟ, 1]);G1 โ† ๐‘Ÿ๐‘’๐‘ โ„Ž๐‘Ž๐‘๐‘’ (U1ฮ”ฮ”ฮ”โˆ’11 , [๐‘…0, ๐ผ1, ๐‘…1]);G2 โ† ๐‘Ÿ๐‘’๐‘ โ„Ž๐‘Ž๐‘๐‘’ (ฮ”ฮ”ฮ”1S1V๐‘‡

1 , [๐‘…1, ๐ผ2, ๐‘…2]);

tensor decomposition algorithms and send to other parties the cor-responding tensor cores. All parties perform an addition operationsusing their corresponding tensor cores based on tensor multilin-ear operations. The conversion from TN format to the additivesecret-sharing scheme can be done by all-except-one parties gener-ate randomized TN from randomly-generated share, distribute thegenerated tensor cores to the corresponding party and update allthe tensor cores using distributed tensor operations, all-except-oneparties pass their updated tensor cores to the remaining one (thatdidnโ€™t generate randomized tensor cores before) to generate hissecret share. Future work may consider how to prevent maliciousservers from corrupting tensor network computing protocols.

4.1 Big Data Dispersed Storage, Sharing, andCommunication

Encryption is complicated in terms of key management for big datadistributed applications, encryption requires centralized manage-ment by a trusted authority to authenticate, authorize, and revokeaccess to prevent potential key leakage that may lead to massive

data breach. Our proposal combines the secret-sharing schemebased on distributed TNs and metadata privacy to seamlessly se-cure big data storage, communication, and sharing. Distributed trustcan be achieved by decentralizing the fragments / metadata encryp-tion and access control mechanisms. Furthermore, the metadatamanagement is flexible such that it can be done in a centralized ordecentralized manner by using enterprise management systems, orin a distributed manner on the individual userโ€™s side. Any softwareapplications can reconstruct the original data if granted access tothe metadata information and shredded fragments. Data integritycan be ensured by cryptographic hashing; whereas data availabilitycan be gueranteed by integrating in Hadoop Distributed File System(HDFS). The advantages of distributed TN representations for se-cure data storage / sharing include privacy protection, compression,granular access control, updatability, and compressed computation.

Metadata serves as the logical โ€œmapโ€ for users to navigate throughthe information and data; metadata also helps auditors to carry outsystem review and post-breach damage assessment. After decom-posing big data and distribute each tensor core or sub-block tomultiple storage locations using our proposed randomized tensordecomposition algorithms, the master metadata files are updatedwith the locations and anonymized filenames of each tensor blocks,tensor structure, cryptographic hashes, random seeds used to per-mute the mode variables, and usersโ€™ access permission; the storagelocations and filenames of the tensor fragments can be routinelyrenewed to enhance the data privacy protection. The master meta-data files can be further encrypted and password-protected on theusersโ€™ side. The metadata of each tensor core stored on the dis-tributed storage locations contains only the anonymized filenameand location such that they are unlinkable in the event of massivedata breach; data encryption and access control based on role man-agement policy can be implemented in a decentralized manner toprotect the tensor cores. The system architecture and metadata or-ganization is beyond the scope of this work but will be considered infuture to take account of the various application scenarios, systemperformance, and requirements for different big data applications.

4.2 Big Data Dispersed ComputationTensor network (TN) naturally supports distributed / dispersedcomputation using the smaller, interconnected tensor cores / blocksafter big data decomposition [12, 14, 43]. Some basic arithmeticoperations in Tucker format are derived in [43]. Let

A = [[ G๐ด; A(1) , A(2) , . . . , A(๐‘ ) ]]

B = [[ G๐ต ; B(1) , B(2) , . . . , B(๐‘ ) ]](4)

where G๐ฟ , ๐ฟ โˆˆ {๐ด, ๐ต} and A(๐‘˜)/B(๐‘˜) , ๐‘˜ โˆˆ {1, 2, . . . , ๐‘ } correspondto the Tucker core tensors and factor matrices respectively,

(๐‘Ž) A + B = [[ G๐ด โŠ• G๐ต ;A(1) โŠž B(1) , . . . , A(๐‘ ) โŠž B(๐‘ ) ]]

(๐‘) A โŠ• B = [[ G๐ด โŠ• G๐ต ;A(1) โŠ• B(1) , . . . , A(๐‘ ) โŠ• B(๐‘ ) ]]

(๐‘) A โŠ› B = [[ G๐ด โŠ— G๐ต ;A(1) โŠ  B(1) , . . . , A(๐‘ ) โŠ  B(๐‘ ) ]]

(๐‘‘) A โŠ— B = [[ G๐ด โŠ— G๐ต ;A(1) โŠ— B(1) , . . . , A(๐‘ ) โŠ— B(๐‘ ) ]]

(5)

The tensor operations expressed with the symbols โŠ•, โŠž, โŠ›, โŠ—, andโŠ  refer to the direct sum, partial direct sum, Hadamard product,Kronecker product, and partial Kronecker product respectively, the

Jenn-Bing Ong, Wee-Keong Ng, Ivan Tjuawinata, Chao Li, Jielin Yang, Sai None Myne, Huaxiong Wang, Kwok-Yan Lam, and C.-C. Jay Kuo

formal definitions can be found in [43]. Linear algebra operationsfor all tensor formats can be derived using the following rules [17]:โ€ข separable components are added, or multiplied indepen-dently in each tensor core for all variables,โ€ข all rank sums are added in linear operations, and multipliedin bilinear operations.

During iterative computations, the tensor rank grows quickly espe-cially with the multiplications. Hence, another important operationcalled rank truncation should be provided with the tensor format.

TT format and its variants support wide range of multilinear op-erations such as addition, multiplication, matrix-by-matrix/vectormultiplication, direct sum, Hadamard, Kronecker, and inner prod-uct [12, 14, 43]. As shown in Figure 4, multilinear operations in TTformat can be performed naturally in dispersed (and compressed)manner, making it well-suited for big data processing and scien-tific computing. TT-rank grows with every multilinear operationsand quickly become computationally prohibitive, the TT-rounding(or recompression) [55] procedure can be implemented to reducethe TT-ranks by first orthogonalizing the tensor cores using QRdecomposition and then compress using SVD decomposition, allperformed in TT format. The randomized TT-SVD algorithm pro-posed in Algorithm 3 can be easily adapted to the second step ofTT-rounding procedure. Algorithm 5 shows an example of random-ized rTT-rounding algorithm for an ๐‘ ๐‘กโ„Ž-order tensor. To computenon-linear functions, TT cross-approximation can be used [54]. Theidea of tensor cross or pseudo-skeleton approximation is to samplefrom the TN, reconstruct and compute arbitrary functions fromthe sample points, decompose the sample updates and update theoriginal TN accordingly, but how to ensure the privacy preservationof tensor cross approximation is still a question remains.

Tensor network computing naturally supports a number of mul-tilinear operations in floating- / fixed-point representations withminimal data pre-processing, unlike classical SMPC schemes thatonly support limited secure operations (e.g., addition and multipli-cation) and has to be pre-processed every time to carry out differentoperations. Therefore, SMPC generally requires many rounds ofcommunication between the servers in order to compute complexfunctions. With TN representations, multilinear operations can bedone in compressed and dispersed manner without the need to re-construct the original tensor, this is the major advantage of tensorcomputation in overcoming the curse of dimensionality for large-scale optimization problems. Tensor multilinear operations gener-ally require only computation on each tensor core, but some tensorcomputation schemes require communication between servers likethe TT-rounding scheme mentioned before and the famous DensityMatrix Renormalization Group (DMRG) scheme [61, 62]. UnlikeSMPC schemes, the communication is mainly tensor cores insteadof the secret shares of original tensor, which are generally muchsmaller in size. However, dispersed tensor computing leaks more in-formation than SMPC schemes during communication, one way toovercome this is to continually ingest fresh entropy from complexdata when performing dispersed tensor computation.

5 EXPERIMENTSExperimental Setup. The experiments are carried out using a work-station with 64-bit Intelยฎ Xeonยฎ W-2123 CPU 3.60GHz, 16.0GB

Algorithm 5: Proposed randomized TT-rounding (rTT-rounding) based on [55] to reduce the size of TT-cores.Input :TT cores of an ๐‘ ๐‘กโ„Ž-order tensor stored on servers,

A = โŸจG1โŸฉ1โŸจG2โŸฉ2 ยท ยท ยท โŸจG๐‘โˆ’1โŸฉ๐‘โˆ’1โŸจG๐‘ โŸฉ๐‘ and ๐œ– .Output :Updated TT cores A = โŸจG1โŸฉ1โŸจG2โŸฉ2 ยท ยท ยท โŸจG๐‘ โŸฉ๐‘

such that โˆฅA โˆ’ Aโˆฅ๐น โฉฝ ๐œ– โˆฅAโˆฅ๐น .Initialization: TT-rank โŸจ๐‘…0โŸฉ1 = 1; โŸจ๐‘…๐‘ โŸฉ๐‘ = 1;

Perturbation threshold ๐›ฟ ;Truncation parameter ๐›ฟ๐œ– = ๐œ–โˆš

๐‘โˆ’1;

Right-to-left QR orthogonalization:[โŸจ๐ผ1โŸฉ1, โŸจ๐‘…1โŸฉ1] โ† ๐‘ โ„Ž๐‘Ž๐‘๐‘’ (โŸจG1โŸฉ1);[โŸจ๐‘…๐‘โˆ’1โŸฉ๐‘ , โŸจ๐ผ๐‘ โŸฉ๐‘ ] โ† ๐‘ โ„Ž๐‘Ž๐‘๐‘’ (โŸจG๐‘ โŸฉ๐‘ );โŸจG1โŸฉ1 โ† ๐‘Ÿ๐‘’๐‘ โ„Ž๐‘Ž๐‘๐‘’ (โŸจG1โŸฉ1, [๐‘…0, ๐ผ1, ๐‘…1]);โŸจG๐‘ โŸฉ๐‘ โ† ๐‘Ÿ๐‘’๐‘ โ„Ž๐‘Ž๐‘๐‘’ (โŸจG๐‘ โŸฉ๐‘ , [๐‘…๐‘โˆ’1, ๐ผ๐‘ , ๐‘…๐‘ ]);for ๐‘˜ = ๐‘ to 2 do[โŸจ๐‘…๐‘˜โˆ’1โŸฉ๐‘˜ , โŸจ๐ผ๐‘˜ โŸฉ๐‘˜ , โŸจ๐‘…๐‘˜ โŸฉ๐‘˜ ] โ† ๐‘ โ„Ž๐‘Ž๐‘๐‘’ (โŸจG๐‘˜ โŸฉ๐‘˜ );โŸจG๐‘˜ โŸฉ๐‘˜ โ† ๐‘Ÿ๐‘’๐‘ โ„Ž๐‘Ž๐‘๐‘’ (โŸจG๐‘˜ โŸฉ๐‘˜ , [๐‘…๐‘˜โˆ’1, ๐ผ๐‘˜๐‘…๐‘˜ ]);QR decomposition:

[โŸจQ๐‘˜ โŸฉ๐‘˜ , โŸจR๐‘˜ โŸฉ๐‘˜

]โ† ๐‘„๐‘…(โŸจG๐‘˜ โŸฉ๐‘˜ );

โŸจG๐‘˜ โŸฉ๐‘˜ โ† ๐‘Ÿ๐‘’๐‘ โ„Ž๐‘Ž๐‘๐‘’ (โŸจQ๐‘˜ โŸฉ๐‘˜ , [๐‘…๐‘˜โˆ’1, ๐ผ๐‘˜ , ๐‘…๐‘˜ ]);โŸจG๐‘˜โˆ’1โŸฉ๐‘˜โˆ’1 โ† โŸจG๐‘˜โˆ’1 ร—3 R๐‘˜ โŸฉ๐‘˜โˆ’1;

endLeft-to-right (SVD compress + random disperse):[โŸจ๐‘…0โŸฉ1, โŸจ๐ผ1โŸฉ1, โŸจ๐‘…1โŸฉ1] โ† ๐‘ โ„Ž๐‘Ž๐‘๐‘’ (โŸจG1โŸฉ1);for ๐‘˜ = 1 to ๐‘ โˆ’ 1 doโŸจG๐‘˜ โŸฉ๐‘˜ โ† ๐‘Ÿ๐‘’๐‘ โ„Ž๐‘Ž๐‘๐‘’ (โŸจG๐‘˜ โŸฉ๐‘˜ , [๐‘…๐‘˜โˆ’1๐ผ๐‘˜ , ๐‘…๐‘˜ ]);[โŸจU๐‘˜ โŸฉ๐‘˜ , โŸจS๐‘˜ โŸฉ๐‘˜ , โŸจV๐‘˜ โŸฉ๐‘˜ ] โ† ๐‘ก๐‘†๐‘‰๐ท (โŸจG๐‘˜ โŸฉ๐‘˜ , ๐›ฟ๐‘Ÿ๐‘’๐‘™ . = ๐›ฟ๐œ– );โŸจ๐‘…๐‘˜ โŸฉ๐‘˜ โ† ๐‘ โ„Ž๐‘Ž๐‘๐‘’ (โŸจS๐‘˜ โŸฉ๐‘˜ , 1); โŸจฮ”ฮ”ฮ”๐‘˜ โŸฉ๐‘˜ โˆผ U(๐›ฟ, 1);โŸจG๐‘˜ โŸฉ๐‘˜ โ† ๐‘Ÿ๐‘’๐‘ โ„Ž๐‘Ž๐‘๐‘’ (โŸจU๐‘˜ฮ”ฮ”ฮ”

โˆ’1๐‘˜โŸฉ๐‘˜ , [๐‘…๐‘˜โˆ’1, ๐ผ๐‘˜ , ๐‘…๐‘˜ ]);

โŸจG๐‘˜+1โŸฉ๐‘˜+1 โ† โŸจG๐‘˜+1 ร—1 ฮ”ฮ”ฮ”๐‘˜S๐‘˜V๐‘‡๐‘˜โŸฉ๐‘˜+1;

endโŸจG1โŸฉ1 โ† ๐‘Ÿ๐‘’๐‘ โ„Ž๐‘Ž๐‘๐‘’ (โŸจG1โŸฉ1, [๐ผ1, ๐‘…1]);โŸจG๐‘ โŸฉ๐‘ โ† ๐‘Ÿ๐‘’๐‘ โ„Ž๐‘Ž๐‘๐‘’ (โŸจG๐‘ โŸฉ๐‘ , [๐‘…๐‘โˆ’1, ๐ผ๐‘ ]);

Figure 4: Tensor network diagrams of (a) a vector, x โˆˆ R๐ผ1๐ผ2๐ผ3๐ผ4in vector TT format, (b) a matrix, A โˆˆ R๐ผ1๐ผ2๐ผ3๐ผ4ร—๐ฝ1 ๐ฝ2 ๐ฝ3 ๐ฝ4 in ma-trix TT format, (b) matrix-by-vector multiplication ๐‘ฆ = Ax,(c) quadratic form, x๐‘‡Ax with ๐ผ๐‘› = ๐ฝ๐‘› [43]. The dashed / dot-ted blue boxes showeach of the tensor blocks and operationsthat can be performed in multi-party computation setting.

Protecting Big Data Privacy Using Randomized Tensor Network Decomposition and Dispersed Tensor Computation [Experiments, Analyses & Benchmarks]

RAM. Privacy metrics such as Pearsonโ€™s correlation coefficient, his-togram analysis, and normalized mutual information are used tomeasure the privacy leakage of the proposed randomized TN de-compositions. Further comparisons are made between the originaland the proposed randomized TN decompositions in terms of thecomputational speed, compression ratio, and distortion analysis ofthe reconstructed data from TN compression. For image data, thedistortion as a result of the TN compression can be measured bythe normalized L2-dissimilarity, which is defined by

1๐‘ โ€ฒ

๐‘ โ€ฒโˆ‘๐‘›=1

| |x๐‘› โˆ’ xโ€ฒ๐‘› | |2| |x๐‘› | |2

(6)

where x๐‘› , ๐‘› โˆˆ {1, 2, . . . , ๐‘ โ€ฒ} are the set of original images andxโ€ฒ๐‘› , ๐‘› โˆˆ {1, 2, . . . , ๐‘ โ€ฒ} are the set of reconstructed images afterTN compression, | | ยท | | is the Euclidean norm. Here, we study theproposed rTT-SVD, rTR-SVD, and rTD algorithms only becauserHT is based on recursive rTD, therefore showing rTD is privacy-preserving implies that rHT is also privacy-preserving for larger-scale tensor. The perturbation factor ๐›ฟ for randomized TN is set as0.05 for all the experiments, hence the diagonal perturbation matrixฮ”ฮ”ฮ” falls within the range [0.05, 1] uniformly.

Datasets. Table 2 tabulates all the datasetsโ€™ sample size and modesize used in this study. Experiments are carried out on 1D, 2D, and3D biometric datasets to investigate thoroughly the proposed ran-domized TN algorithms across different data dimensions for privacypreservation. In general, vector and matrix data are reshaped intohigher-order tensor before TN decomposition. The gait sensor data-base is recorded using smartphoneโ€™s inertial sensors, the samplingfrequency is 100Hz and the total walking distance is 640 meters persession [71]. The training images for real and fake face detectionare provided by the Computational Intelligence and PhotographyLab, Department of Computer Science, Yonsei University on Kaggleonline data-sharing platform; only the real facial images are used inthe experiments. The RGB channels of a facial image have very highspectral correlation, therefore the channels are stacked in 3D fortensor decomposition. Yale face database contains the GIF imagesof 15 human subjects, each with 11 different facial expressions orconfigurations [6]. Finally, we also generate a 3D super-diagonaltensor with ones on the (๐‘–, ๐‘–, ๐‘–) entries for our investigation studies.

Table 2: Datasets used in the experimental studies.

Dataset Subjects Mode SizeHuman Gait (walking) 93 58 FeaturesReal & Fake Face Images โˆผ1000 600ร—600ร—3

Yale Face Database 15 320ร—243ร—11Super-diagonal Tensor N/A 10ร—10ร—10

Data Complexity for Randomized TN Decompositions. Figures 5and 6 show the effect on the TT decomposition before and afterpadding noisy data to a relatively simple (full-rank) super-diagonaltensor, both approaches reproduce the same super-diagonal tensorafter reconstruction. However, naive padding with noise usuallyresults in high TN computation and storage cost due to the higherrank-complexity, whereas our proposed randomized TN algorithmssimply make use of the complex correlation structure commonly

Figure 5: TT decomposition of super-diagonal tensor usingTT-SVD (top) and rTT-SVD (bottom) proposed in Alg. 3.

Figure 6: TTdecomposition of super-diagonal tensor paddedwith noise using TT-SVD (top) and rTT-SVD (bottom).

Figure 7: Top left: the time series of human gait sensor datain walking mode. Other subplots show the dataโ€™s TT decom-position. Top right and bottom left: normalized TT core G1and G3. Bottom right: Normalization factor of G3.

found in big data to generate highly-randomized tensor blocks.Figure 7 shows the randomized TT decomposition of human gaitsensor data. To preserve important dataset features during the TNcompression, each of the attributes is standardized to zero meanand variance equals to one, i.e., the z-score.

Jenn-Bing Ong, Wee-Keong Ng, Ivan Tjuawinata, Chao Li, Jielin Yang, Sai None Myne, Huaxiong Wang, Kwok-Yan Lam, and C.-C. Jay Kuo

Figure 8: Histogram analysis of the TT decomposition of afacial image. The normalized TT cores are either Gaussianor Laplacian distributed, which are usually different fromthe image histogram distribution.

Figure 9: Normalized TT cores produced from two random-ized rTT-SVD decompositions of a facial image using Al-gorithm 3 (top and bottom rows). Correlation structurethat contributes higher variability (i.e., lower rank) is muchharder to perturb and the normalization factor in the last TTcore is mostly preserved in the randomized decomposition.

Privacy Leakage Analysis. Figure 8 and 9 shows the TT decompo-sition of a facial image. The shape of the facial image is permutedto 600ร— 3ร— 600 to have a balance shape of TT cores. In general, thehistogram of the TN cores and factors are Gaussian or Laplacian dis-tributed, which is very different from the histogram of the originaldata. Figure 10 and 11 show the reconstructed images from incom-plete TN representations and measure the amount of informationoverlap with original images using normalized mutual information(NMI). The results show that if each of the TN cores or factors arelarge enough in terms of rank complexity or block size, the privacyleakage is minimal without having a complete TN representationsfor a data. In this case, the Tucker factor U2 is very small in size andtherefore results in the highest NMI. Figures 12 and 13 show thecorrelation between the randomized and non-randomized TN coresfor particular rank using the Yale Face Database. The correlationis higher for lower rank, this means it is harder to perturb correla-tion structure that contributes to higher variability within the data.One way to overcome this is to permute the mode variables alongeach dimension after TN compression to protect the privacy of thedistribution of each tensor mode.

Figure 10: Reconstructed images from TN by replacing ei-ther a tensor core or factor matrix generated from a ran-domized TN decomposition process with another. First rowcorresponds to rTT decomposition, second row is rTR, andthird row is rTD respectively.

Figure 11: Normalized mutual information (NMI) betweenthe original data and the reconstructed data from differentrandomized TNs with one core or factor replaced.

Figure 12:Absolute value of Pearsonโ€™s correlation coefficientfor each rank value between the TT cores generated fromthe original and randomized TT-SVD algorithms. Left: TTcore G1. Right: TT core G3. The x-axes refer to๐‘…1 and๐‘…2 resp.

Data Compressibility and Algorithmic Efficiency. Table 3 mea-sures the time efficiency of TN decomposition and reconstructionfor Real and Fake Facial Image Database. The randomized TN de-compositions generally take slightly longer time compared to thenon-randomized TN decompositionmainly due to an extra SVD step

Protecting Big Data Privacy Using Randomized Tensor Network Decomposition and Dispersed Tensor Computation [Experiments, Analyses & Benchmarks]

Figure 13:Absolute value of Pearsonโ€™s correlation coefficientfor each rank value bet. the Tucker factors generated fromthe original and randomized TD algorithms. Left: TD factorU1. Right: TD factor U3. The x-axes refer to ๐‘…1 and ๐‘…3 resp.

Figure 14: Normalized L2-dissimilarity between the originaldata and the reconstructed data from randomized and non-randomized TN algorithms for diff. compression ratio.

needed to generate randomized tensor blocks. TR reconstruction islong (โˆผ1 min) because there is a loop in the TN structure. Figure 14shows the image distortion analysis under different TN compres-sion ratio for the Yale Face Database. Randomized TN algorithmsresult in slightly higher distortion in the reconstructed data com-pared to non-randomized TN algorithms. This is expected becauserandomized TN algorithms produce sub-optimal decomposition.Randomized TT decomposition generates the lowest distortionespecially with high compression ratio compared to randomizedrTR and rTD decomposition. TT representation strikes a good bal-ance between privacy preservation, computational, and storageefficiency.

Table 3: Comparison bet. the proposed randomized TNs andoriginal algorithms in terms of computational efficiency.The dataset used comes from the Real & Fake Facial ImagesDatabase and the compression ratio is set as โˆผ0.725.

Random. TN Tensor Rank TN Decompose /Algorithm Reconstruct TimeHOSVD ๐‘…1 = ๐‘…3 = 350, ๐‘…2 = 3 0.2794 / 0.0077 srTD ๐‘…1 = ๐‘…3 = 350, ๐‘…2 = 3 0.3104 / 0.0081 s

TT-SVD ๐‘…1 = ๐‘…2 = 350 0.1851 / 0.0054 srTT-SVD ๐‘…1 = ๐‘…2 = 350 0.2817 / 0.0053 sTR-SVD ๐‘…0/1 = ๐‘…2 = 20, ๐‘…3 = 45 0.3563 / 1.1746 srTR-SVD ๐‘…0/1 = ๐‘…2 โ‰ˆ 20, ๐‘…3 โ‰ˆ 45 0.3292 / 1.1150 s

6 DISCUSSIONScalability is an important consideration for both the success ofbig data analytics and widespread adoption of privacy-preservingtechniques. We have proposed a simple perturbation technique thatcan be easily adapted for randomized decomposition of various ten-sor network structures. The proposed secret-sharing scheme basedon dispersed TN representations / computation is very efficient interms of storage, computational, and communication complexitydue to natural support for dispersed tensor computation. Privacyleakage analysis is carried out to verify that the proposed schemeis secured against semi-honest adversary, however, privacy leakagemay still happen when performing dispersed tensor operations,which requires further more investigation. One way is to ingestfresh entropy from complex data when performing tensor opera-tions, hence increases the uncertainty of original tensor estimation.Nevertheless, the proposed scheme can be easily combined withexisting data-security solutions such as data anonymization, encryp-tion, and secure-enclave technologies to provide layered protection.The potential extension of this work includes various applicationsof privacy-preserving big data analytics [12, 14] and large-scalenumerical computing [28, 37, 38]. Another potential direction is ex-tending our proposed secret-sharing scheme for federated machinelearning and applying differential privacy to protect the privacy ofindividual items in the training dataset [9, 49, 63].

Jenn-Bing Ong, Wee-Keong Ng, Ivan Tjuawinata, Chao Li, Jielin Yang, Sai None Myne, Huaxiong Wang, Kwok-Yan Lam, and C.-C. Jay Kuo

REFERENCES[1] Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov,

Kunal Talwar, and Li Zhang. 2016. Deep learning with differential privacy. InProceedings of the 2016 ACM SIGSAC Conference on Computer and CommunicationsSecurity. ACM, 308โ€“318.

[2] Gagan Aggarwal, Mayank Bawa, Prasanna Ganesan, Hector Garcia-Molina, Kr-ishnaram Kenthapadi, Rajeev Motwani, Utkarsh Srivastava, Dilys Thomas, andYing Xu. 2005. Two can keep a secret: A distributed architecture for securedatabase services. CIDR 2005 (2005).

[3] Animashree Anandkumar, Rong Ge, Daniel Hsu, Sham M Kakade, and MatusTelgarsky. 2014. Tensor decompositions for learning latent variable models. TheJournal of Machine Learning Research 15, 1 (2014), 2773โ€“2832.

[4] Rafael Ballester-Ripoll, Peter Lindstrom, and Renato Pajarola. 2019. TTHRESH:Tensor compression for multidimensional visual data. IEEE transactions onvisualization and computer graphics (2019).

[5] Raef Bassily, Adam Smith, and Abhradeep Thakurta. 2014. Private empiricalrisk minimization: Efficient algorithms and tight error bounds. In 2014 IEEE 55thAnnual Symposium on Foundations of Computer Science. IEEE, 464โ€“473.

[6] Peter N. Belhumeur, Joรฃo P Hespanha, and David J. Kriegman. 1997. Eigenfaces vs.fisherfaces: Recognition using class specific linear projection. IEEE Transactionson pattern analysis and machine intelligence 19, 7 (1997), 711โ€“720.

[7] Gรถran Bergqvist and Erik G Larsson. 2010. The higher-order singular valuedecomposition: Theory and an application [lecture notes]. IEEE Signal ProcessingMagazine 27, 3 (2010), 151โ€“154.

[8] Arnaud Berlioz, Arik Friedman, Mohamed Ali Kaafar, Roksana Boreli, and ShlomoBerkovsky. 2015. Applying differential privacy to matrix factorization. In Pro-ceedings of the 9th ACM Conference on Recommender Systems. ACM, 107โ€“114.

[9] Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H BrendanMcMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and Karn Seth. 2017. Prac-tical secure aggregation for privacy-preserving machine learning. In Proceedingsof the 2017 ACM SIGSAC Conference on Computer and Communications Security.ACM, 1175โ€“1191.

[10] Cesar F Caiafa and Andrzej Cichocki. 2014. Stable, robust, and super fast recon-struction of tensors using multi-way projections. IEEE Transactions on SignalProcessing 63, 3 (2014), 780โ€“793.

[11] Kamalika Chaudhuri, Claire Monteleoni, and Anand D Sarwate. 2011. Differen-tially private empirical risk minimization. Journal of Machine Learning Research12, Mar (2011), 1069โ€“1109.

[12] Andrzej Cichocki, Namgil Lee, Ivan Oseledets, Anh-Huy Phan, Qibin Zhao,Danilo P Mandic, et al. 2016. Tensor networks for dimensionality reduction andlarge-scale optimization: Part 1 low-rank tensor decompositions. Foundationsand Trendsยฎ in Machine Learning 9, 4-5 (2016), 249โ€“429.

[13] Andrzej Cichocki, Danilo Mandic, Lieven De Lathauwer, Guoxu Zhou, QibinZhao, Cesar Caiafa, and Huy Anh Phan. 2015. Tensor decompositions for signalprocessing applications: From two-way to multiway component analysis. IEEESignal Processing Magazine 32, 2 (2015), 145โ€“163.

[14] Andrzej Cichocki, Anh-Huy Phan, Qibin Zhao, Namgil Lee, Ivan Oseledets,Masashi Sugiyama, Danilo P Mandic, et al. 2017. Tensor networks for dimen-sionality reduction and large-scale optimization: Part 2 applications and futureperspectives. Foundations and Trendsยฎ in Machine Learning 9, 6 (2017), 431โ€“673.

[15] Ronald Cramer, Ivan Damgard, and J Buus Nielsen. 2012. Secure MultipartyComputation and Secret Sharing-An Information Theoretic Approach BookDraft.

[16] Justin Dauwels, K Srinivasan, M Ramasubba Reddy, and Andrzej Cichocki. 2012.Near-lossless multichannel EEG compression based on matrix and tensor de-compositions. IEEE journal of biomedical and health informatics 17, 3 (2012),708โ€“714.

[17] Sergey Dolgov and Boris Khoromskij. 2013. Two-level QTT-Tucker format foroptimized tensor calculus. SIAM J. Matrix Anal. Appl. 34, 2 (2013), 593โ€“623.

[18] Josep Domingo-Ferrer, Oriol Farrร s, Jordi Ribes-Gonzรกlez, and David Sรกnchez.2019. Privacy-preserving cloud computing on sensitive data: A survey of methods,products and challenges. Computer Communications 140 (2019), 38โ€“60.

[19] Cynthia Dwork. 2011. Differential privacy. Encyclopedia of Cryptography andSecurity (2011), 338โ€“340.

[20] Cynthia Dwork, Aaron Roth, et al. 2014. The algorithmic foundations of differ-ential privacy. Foundations and Trendsยฎ in Theoretical Computer Science 9, 3โ€“4(2014), 211โ€“407.

[21] David Evans, Vladimir Kolesnikov, Mike Rosulek, et al. 2018. A pragmatic intro-duction to secure multi-party computation. Foundations and Trendsยฎ in Privacyand Security 2, 2-3 (2018), 70โ€“246.

[22] Benjamin Fabian, Tatiana Ermakova, and Philipp Junghanns. 2015. Collaborativeand secure sharing of healthcare data in multi-clouds. Information Systems 48(2015), 132โ€“150.

[23] Jun Feng, Laurence T Yang, Qing Zhu, and Kim-Kwang Raymond Choo. 2018.Privacy-preserving tensor decomposition over encrypted data in a federatedcloud environment. IEEE Transactions on Dependable and Secure Computing(2018).

[24] Arik Friedman, Shlomo Berkovsky, and Mohamed Ali Kaafar. 2016. A differentialprivacy framework for matrix factorization recommender systems. User Modelingand User-Adapted Interaction 26, 5 (2016), 425โ€“458.

[25] Craig Gentry et al. 2009. Fully homomorphic encryption using ideal lattices.. InStoc, Vol. 9. 169โ€“178.

[26] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley,Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarialnets. In Advances in neural information processing systems. 2672โ€“2680.

[27] Lars Grasedyck. 2010. Hierarchical singular value decomposition of tensors.SIAM J. Matrix Anal. Appl. 31, 4 (2010), 2029โ€“2054.

[28] Lars Grasedyck, Daniel Kressner, and Christine Tobler. 2013. A literature surveyof low-rank tensor approximation techniques. GAMM-Mitteilungen 36, 1 (2013),53โ€“78.

[29] Wolfgang Hackbusch. 2012. Tensor spaces and numerical tensor calculus. Vol. 42.Springer.

[30] Wolfgang Hackbusch and Stefan Kรผhn. 2009. A new scheme for the tensorrepresentation. Journal of Fourier analysis and applications 15, 5 (2009), 706โ€“722.

[31] Hafiz Imtiaz and Anand D Sarwate. 2018. Distributed Differentially PrivateAlgorithms for Matrix and Tensor Factorization. IEEE Journal of Selected Topicsin Signal Processing 12, 6 (2018), 1449โ€“1464.

[32] Katarzyna Kapusta and Gerard Memmi. 2015. Data protection by means offragmentation in distributed storage systems. In 2015 International Conference onProtocol Engineering (ICPE) and International Conference on New Technologies ofDistributed Systems (NTDS). IEEE, 1โ€“8.

[33] Katarzyna Kapusta and Gerard Memmi. 2017. Data protection by means offragmentation in various different distributed storage systems-a survey. arXivpreprint arXiv:1706.05960 (2017).

[34] K. Kapusta, H. Qiu, and G. Memmi. 2019. Poster Abstract: Secure Data Sharingby Means of Fragmentation, Encryption, and Dispersion. In IEEE INFOCOM 2019- IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).1051โ€“1052. https://doi.org/10.1109/INFCOMW.2019.8845243

[35] K. Kapusta, H. Qiu, and G. Memmi. 2019. Secure Data Sharing with Fast AccessRevocation through Untrusted Clouds. In 2019 10th IFIP International Conferenceon New Technologies, Mobility and Security (NTMS). 1โ€“5. https://doi.org/10.1109/NTMS.2019.8763850

[36] Azam Karami, Mehran Yazdi, and Grรฉgoire Mercier. 2012. Compression of hy-perspectral images using discerete wavelet transform and tucker decomposition.IEEE journal of selected topics in applied earth observations and remote sensing 5, 2(2012), 444โ€“450.

[37] Boris N Khoromskij. 2012. Tensors-structured numerical methods in scientificcomputing: Survey on recent advances. Chemometrics and Intelligent LaboratorySystems 110, 1 (2012), 1โ€“19.

[38] Boris N Khoromskij. 2018. Tensor numerical methods in scientific computing.Vol. 19. Walter de Gruyter GmbH & Co KG.

[39] Yejin Kim, Jimeng Sun, Hwanjo Yu, and Xiaoqian Jiang. 2017. Federated tensorfactorization for computational phenotyping. In Proceedings of the 23rd ACMSIGKDD International Conference on Knowledge Discovery and Data Mining. ACM,887โ€“895.

[40] Hugo Krawczyk. 1993. Secret sharing made short. In Annual international cryp-tology conference. Springer, 136โ€“146.

[41] Liwei Kuang, Laurence T Yang, Jun Feng, and Mianxiong Dong. 2015. Securetensor decomposition using fully homomorphic encryption scheme. IEEE Trans-actions on Cloud Computing 6, 3 (2015), 868โ€“878.

[42] Dana Lahat, Tรผlay Adali, and Christian Jutten. 2015. Multimodal data fusion:an overview of methods, challenges, and prospects. Proc. IEEE 103, 9 (2015),1449โ€“1477.

[43] Namgil Lee and Andrzej Cichocki. 2018. Fundamental tensor operations for large-scale data analysis using tensor network formats. Multidimensional Systems andSignal Processing 29, 3 (2018), 921โ€“960.

[44] Jun Liu, Xiangqian Liu, and Xiaoli Ma. 2008. First-order perturbation analysis ofsingular vectors in singular value decomposition. IEEE Transactions on SignalProcessing 56, 7 (2008), 3044โ€“3049.

[45] Ziqi Liu, Yu-Xiang Wang, and Alexander Smola. 2015. Fast differentially privatematrix factorization. In Proceedings of the 9th ACM Conference on RecommenderSystems. ACM, 171โ€“178.

[46] Edgar Alonso Lopez-Rojas and Stefan Axelsson. 2016. A review of computersimulation for fraud detection research in financial datasets. In 2016 Future Tech-nologies Conference (FTC). IEEE, 932โ€“935.

[47] Jing Ma, Qiuchen Zhang, Jian Lou, Joyce C Ho, Li Xiong, and Xiaoqian Jiang.2019. Privacy-Preserving Tensor Factorization for Collaborative Health DataAnalysis. In Proceedings of the 28th ACM International Conference on Informationand Knowledge Management. ACM, 1291โ€“1300.

[48] Michael W Mahoney and Petros Drineas. 2009. CUR matrix decompositions forimproved data analysis. Proceedings of the National Academy of Sciences 106, 3(2009), 697โ€“702.

[49] H Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, et al. 2016.Communication-efficient learning of deep networks from decentralized data.arXiv preprint arXiv:1602.05629 (2016).

Protecting Big Data Privacy Using Randomized Tensor Network Decomposition and Dispersed Tensor Computation [Experiments, Analyses & Benchmarks]

[50] Gรฉrard Memmi, Katarzyna Kapusta, and Han Qiu. 2015. Data protection: Combin-ing fragmentation, encryption, and dispersion. In 2015 International Conferenceon Cyber Security of Smart Cities, Industrial Control System and Communications(SSIC). IEEE, 1โ€“9.

[51] Valeria Nikolaenko, Stratis Ioannidis, Udi Weinsberg, Marc Joye, Nina Taft, andDan Boneh. 2013. Privacy-preserving matrix factorization. In Proceedings of the2013 ACM SIGSAC conference on Computer & communications security. ACM,801โ€“812.

[52] Sergey I Nikolenko. 2019. Synthetic Data for Deep Learning. arXiv preprintarXiv:1909.11512 (2019).

[53] Romรกn Orรบs. 2019. Tensor networks for complex quantum systems. NatureReviews Physics 1, 9 (2019), 538โ€“550.

[54] Ivan Oseledets and Eugene Tyrtyshnikov. 2010. TT-cross approximation formultidimensional arrays. Linear Algebra Appl. 432, 1 (2010), 70 โ€“ 88. https://doi.org/10.1016/j.laa.2009.07.024

[55] Ivan V Oseledets. 2011. Tensor-train decomposition. SIAM Journal on ScientificComputing 33, 5 (2011), 2295โ€“2317.

[56] Evangelos E Papalexakis, Christos Faloutsos, and Nicholas D Sidiropoulos. 2016.Tensors for data mining and data fusion: Models, applications, and scalablealgorithms. ACM Transactions on Intelligent Systems and Technology (TIST) 8, 2(2016), 1โ€“44.

[57] Han Qiu, Katarzyna Kapusta, Zhihui Lu, Meikang Qiu, and Gerard Memmi. 2019.All-Or-Nothing data protection for ubiquitous communication: Challenges andperspectives. Information Sciences 502 (2019), 434โ€“445.

[58] Ronald L Rivest. 1997. All-or-nothing encryption and the package transform. InInternational Workshop on Fast Software Encryption. Springer, 210โ€“218.

[59] David Sรกnchez and Montserrat Batet. 2017. Privacy-preserving data outsourcingin the cloud via semantic data splitting. Computer Communications 110 (2017),187โ€“201.

[60] Anand D Sarwate and Kamalika Chaudhuri. 2013. Signal processing and machinelearning with differential privacy: Algorithms and challenges for continuousdata. IEEE signal processing magazine 30, 5 (2013), 86โ€“94.

[61] Ulrich Schollwรถck. 2005. The density-matrix renormalization group. Reviews ofmodern physics 77, 1 (2005), 259.

[62] Ulrich Schollwรถck. 2011. The density-matrix renormalization group in the age ofmatrix product states. Annals of physics 326, 1 (2011), 96โ€“192.

[63] Reza Shokri and Vitaly Shmatikov. 2015. Privacy-preserving deep learning. InProceedings of the 22nd ACM SIGSAC conference on computer and communicationssecurity. ACM, 1310โ€“1321.

[64] Nicholas D Sidiropoulos, Lieven De Lathauwer, Xiao Fu, Kejun Huang, Evange-los E Papalexakis, and Christos Faloutsos. 2017. Tensor decomposition for signalprocessing and machine learning. IEEE Transactions on Signal Processing 65, 13(2017), 3551โ€“3582.

[65] Qingquan Song, Hancheng Ge, James Caverlee, and Xia Hu. 2017. Tensor Com-pletion Algorithms in Big Data Analytics. arXiv preprint arXiv:1711.10105 (2017).

[66] Laurent Sorber, Marc Van Barel, and Lieven De Lathauwer. 2015. Structured DataFusion. IEEE Journal of Selected Topics in Signal Processing 9, 4 (2015), 586โ€“600.https://doi.org/10.1109/JSTSP.2015.2400415

[67] Gilbert W Stewart. 1998. Perturbation theory for the singular value decomposition.Technical Report.

[68] Jimeng Sun, Dacheng Tao, and Christos Faloutsos. 2006. Beyond streams andgraphs: dynamic tensor analysis. In Proceedings of the 12th ACM SIGKDD inter-national conference on Knowledge discovery and data mining. ACM, 374โ€“383.

[69] Jimeng Sun, Dacheng Tao, Spiros Papadimitriou, Philip S Yu, and Christos Falout-sos. 2008. Incremental tensor analysis: Theory and applications. ACM Transac-tions on Knowledge Discovery from Data (TKDD) 2, 3 (2008), 11.

[70] Joel A Tropp, Alp Yurtsever, Madeleine Udell, and Volkan Cevher. 2017. Random-ized single-view algorithms for low-rank matrix approximation. (2017).

[71] Amir Vajdi, Mohammad Reza Zaghian, Saman Farahmand, Elham Rastegar, KianMaroofi, Shaohua Jia, Marc Pomplun, Nurit Haspel, and Akram Bayat. 2019.Human gait database for normal walk collected by smart phone accelerometer.arXiv preprint arXiv:1905.03109 (2019).

[72] Isabel Wagner and David Eckhoff. 2018. Technical privacy metrics: a systematicsurvey. ACM Computing Surveys (CSUR) 51, 3 (2018), 57.

[73] Yining Wang and Anima Anandkumar. 2016. Online and differentially-privatetensor decomposition. In Advances in Neural Information Processing Systems.3531โ€“3539.

[74] Andrew Chi-Chih Yao. 1986. How to generate and exchange secrets. In 27thAnnual Symposium on Foundations of Computer Science (sfcs 1986). IEEE, 162โ€“167.

[75] Buket Yรผksel, Alptekin Kรผpรงรผ, and ร–znur ร–zkasap. 2017. Research issues forprivacy and security of electronic health services. Future Generation ComputerSystems 68 (2017), 1โ€“13.

[76] Feng Zhang, Victor E Lee, and Kim-Kwang Raymond Choo. 2018. Jo-DPMF:Differentially private matrix factorization learning through joint optimization.Information Sciences 467 (2018), 271โ€“281.

[77] Qibin Zhao, Guoxu Zhou, Shengli Xie, Liqing Zhang, and Andrzej Cichocki. 2016.Tensor ring decomposition. arXiv preprint arXiv:1606.05535 (2016).

[78] Guoxu Zhou, Qibin Zhao, Yu Zhang, Tรผlay Adalฤฑ, Shengli Xie, and AndrzejCichocki. 2016. Linked component analysis from matrices to high-order tensors:Applications to biomedical data. Proc. IEEE 104, 2 (2016), 310โ€“331.