survey project-1

Multimedia Big Data Security

Navit Gaur

Fall 2015

1

Abstract -- Multimedia big data is becoming more and more important as the amount of data shared over the web is increasing. With the rise of social networking sites like Facebook, Vine, Instagram, Spotify, the amount of multimedia data shared by the users is increasing exponentially. As the data grows rapidly, complex security issues arise.

This paper will focus on the challenges, issues and needs that multimedia big data is facing in terms of security and different approaches to overcome these issues. Multimedia big data security frameworks, data encryption and decryption algorithms will be analysed and compared.

1. Introduction to Multimedia Big Data Security

With the rise of digital technologies, it is possible nowadays to make an exact copy of the source. Multimedia data is represented as a bit of streams that can be saved on optical or magnetic media. Since digital recording is a process in which every part is read from source and then it is copied to a new destination it is possible to create an exact replica of the source media. Protection of multimedia big data is therefore a crucial problem.

The essential security requirements for multimedia systems are as follows:

· Confidentiality: Encryption can be used to prevent unauthorized entities from getting secret information.

· Data integrity: Any changes in the data can be discerned by using robust and fragile watermarking, digital signatures and message authentication codes.

· Data origin authenticity: Proof of origin can be verified using message authentication codes, digital signatures and fragile and digital watermarking.

· Entity authenticity: Authentication protocols can be used to ensure that the entities taking part in the communication are the ones they claim to be.

· Non-repudiation: Non-repudiation mechanisms can be used to prove whether a particular event or action happened to any parties involved. The event or action can be

generation, sending, receipt or transport of a message. Non-repudiation certificates, non-repudiation tokens and protocols establish the accountability of information.

Distributed multimedia applications can be secured using authentication control mechanisms. However, it is not enough to secure multimedia data broadcast. Multimedia data still needs to be encrypted during transmission.

The next chapter provides a short introduction to encryption techniques and watermarking techniques.

2. Understanding of Multimedia Big Data Security[19][20][21]

This chapter explains how the security requirements mentioned in the previous chapter can be obtained.

· Confidentiality: Cipher systems can be used to hide information from unauthorized users. Private key and public key encryption can be used to encrypt the data. In streaming applications a large amount of data needs to be sent from the sender to the receiver. One of the problems of transmitting a huge amount of data is that data can change due to transmission errors, higher compression rates or scaling operations. If common encryption methods are used then the decryption of an encrypted block may fail because the original data could not be retrieved if one of the blocks is altered. A solution to these problems is to use partial encryption i.e. encrypt only certain parts of data instead of encrypting the whole data.

· Data Integrity: Data integrity can be checked using one way has functions. Also some mechanisms detailed in the next section to can be used to detect any changes in the data. It is not possible to prevent data manipulation using these mechanisms but they make these manipulations detectable. A hash function maps strings of random length to strings of fixed length. Hash functions are public and hence can anyone knowing the function can check the integrity of data by calculating the hash value. As in multimedia applications, data can alter because of scaling or compression, it is not appropriate to apply hash functions directly to media.

2

Instead it should be applied to the semantic data of the media stream.

· Data Origin Authenticity: Authentication codes, digital signatures, fragile and robust digital watermarks can be used to determine data origin authenticity. These mechanisms can also be used to maintain data integrity. All these mechanisms are detection mechanisms and hence they cannot prevent data manipulation but can allow to detect if there was an alteration in data.

Digital signatures can be used to ensure authenticity of multimedia data by using public key encryption systems. Digital signatures should not be applied directly to the multimedia data as the image processing techniques like conversion or scaling maybe used on the multimedia data. Once these operations are applied on the multimedia data it is changes irreversibly. Although the content was not changed the digital signature verification will fail. Digital signatures should be used on the feature codes of multimedia data that are not changed during operations like scaling or compression.

In digital watermarking the multimedia data is slightly modified and the verification data is embedded in it rather than being appended to it. Anyone with the proper secret key can use the watermark to determine whether the data was altered or not by checking the embedded information. The problem with watermarking techniques is that it is not possible to embed a large amount of data at a higher rate for the verification.

· Entity Authenticity: Authentication protocols can be used to ensure that the parties taking part in the communication are indeed who they claim to be. One of the simplest example of an authentication protocol is the challenge response protocol in which the verifier sends a randomly generated number to the claimant. The claimant generates a value using the challenge and a secret and returns it to the verifier. The verifier can then check that the claimant has the appropriate secret key.

· Non-repudiation: Non-repudiation mechanisms based on public key and private key encryption systems are used to provide security techniques that can be used to link data and actions to their originators. By making use of these mechanisms it is possible to prove to the involved parties or third parties whether a particular even or action occurred. The event or action maybe creating a message, transmitting a message or receiving a message

Cryptography is probably one of the most common methods of protecting multimedia data. Secret key encryption used the same key for encrypting and decrypting data. It requires that the secret key be exchanged between the sender and receiver before any transmission takes place. Although the speed of secret key encryption is acceptable the key needs to be exchanged in advance and that is its weakness.

Public key encryption uses two keys. The key used for encryption is different from the one that is used for decryption. The public key is used for encryption and the private key is used for decryption. Anyone can use the public to encrypt the message but the only the user with the private key can decrypt it. This method is more convenient than the secret key method but the encryption and decryption takes longer.

In digital watermarking the original data is modified and additional information is embedded into it which are undetectable during normal use but detectable by computers and software. This additional information that is embedded is called watermark.

Digital watermarking should match the following requirements:

· The embedded watermark should be resilient against any attacks performed by unauthorized entities. The aim of such attacks is to gain access to unwatermarked data by damaging or removing the embedded watermark.

· The watermark embedded in the multimedia data should not cause any visual or audible changes in the data. The watermark should only be

3

detectable with the help of computers and software.

It should be almost impossible to remove or extract the embedded watermark from the data without appropriate access to the secret key.

3. Multimedia Big Data Security Algorithms

This chapter will focus on security algorithms for multimedia big data security. The algorithms will be classified and then they will be reviewed thoroughly. In the last section of the chapter, the algorithms will be compared.

3.1 A Classification of the Algorithms

3.1.1 Fast Encryption Method Based on New FFT Representation

The fast encryption method is based on Fast Fourier Transformation to secure sensitive information within multimedia data. This method has dual key security feature, since it’s not only the encryption key but the mapping or transformation structure should be available to breach the information or data.

3.1.2 Multimedia Big Data Sharing in Social Networks Using Fingerprinting and Encryption in the JPEG2000 Compressed Domain

The development of social interactive media, as exhibited by person to person communication locales, for example, Facebook and YouTube, consolidated with advances in mixed media content investigation, underscores potential dangers for vindictive utilize, for example, unlawful replicating, robbery, written falsification, and misappropriation. Subsequently, secure interactive media sharing and traitor tracing issues have gotten to be basic and critical in social network. Hence this algorithm is focused in securing social media data.

3.1.3 Scrambling

One of the easiest form of encryption that is used for securing multimedia data is Scrambling. It uses encryption process that performs random permutations using some specific pattern to

multimedia data. Image’s histogram remains the same, just the individual positions are shuffled.

3.1.4 Post-Compression Encryption Algorithm

The Secure Real-time Transport Protocol, or the basic approach encrypts the compressed bit stream by packetizing the multimedia data and then encrypting every single packet using AES. It is secure but it has a huge computational overheads and also it is not conducive to different favourable properties of the compressed bit streams in general, owing to the encryption of compressed data.

3.1.5 Pre-Compression Encryption Algorithm

Pre-compression encryption implies encrypting uncompressed or raw bits which will waste a lot of computational resources.

3.1.6 Selective Encryption

Transmitted pictures may have diverse applications, for example, business, military and medicinal applications. So it is important to scramble picture information before transmission over the system to save its security and anticipate unapproved access. This Selective Encryption results in highly correlated original and decrypted images. Hence is a good technique to use.

3.1.7 Joint Video Compression and Encryption (JVCE) Approaches

The principle thought behind joint coding is to incorporate encryption into compression operation by parameterization of the compression blocks, and not adjusting the compressed bits. Entropy Coding and Wavelet Transform are the two main compression blocks where this approach has been applied.

3.2 A Review of the Algorithms

3.2.1 Fast Encryption Method Based on New FFT RepresentationThe FFT encryption method is based on secrecy systems, which is also the set of transformations. The algorithmic representation structure can be divided and observed as follows:

(i) Splitting the Fast Fourier Transform into small sets of transformation.

(ii) Decomposing the transformation using discrete orthogonal transforms.

4

Fig. 1: Encryption method with dual key functionality.

Let us now understand the FFT encryption using Walsh- Hadamard Transform (WHT).

• Sylvester matrix is the base of WHT.

• Hadamard functions form a set of orthogonal functions having value -1 or +1 which means they need only addition or subtraction for computation.

• WHT matrix can be represented as

H2n = H2

n-1 H2n-1

H 2n-1 -H2

n-1 where n=2,3

H1 = 1, H2 = 1 1 1 -1

• We have initially added to the eight-point recursive FFT in view of the Walsh-Hadamard change. This mapping

can be appeared by the factorization as the accompanying:

F8 = [P8][B82][D8][B8

1][P8][B82]T∏(m=1 to 3){ I2

m-1 © H2 © I2

3-m } (2)

where [P8] is the row permutation matrix, which has the format as the accompanying :

When k = 1 , 2 then we get [B81] and [B8

2] respectively as :

5

The iterative algorithm for deriving H8 matrix is:

H8 = ∏(m=1 to 3){ I2m-1 © H2 © I2

3-m}

We also defined Diagonal Matrix as :

[D8] = diag{ 1,1,W3,3 , W4,4 , W5,5 , W6,6 , W7,7 , W8,8 }

= diag{ 1,1,2-1i, 2-21, -2-2i , -2-55.6569i , 2-

55.6569}

• Hence once we deduce the mapping structure our next goal is sensitive data encryption.

Encryption equation for transformation is represented as:

E8 = [P8][B82][S8][D8][B8

1][P8][B82]T ∏(m= 1 to3)

{ I2m-1 © H2 © I2

3-m } (3)

Fig. 2: A flow diagram to represent FFT encryption based on WHT.

Flow diagram depicts the encryption approach inside of the transformation process. [x[O],x[l],...,x[7]j' is the info signal appeared in figure Y, on the left hand side. Likewise, [Y[o],Y[i],...,Y[7]]" is the yield, transformed signal. The procedure is done as the accompanying; first we separate the information picture into 8x8 pieces, and after that we encode the touchy data by utilizing the eq. (3) by the accompanying arrangement:

6

TBlock8 = [E8][Block8]ET8]

After encryption, we converse change the recurrence segments of the information back to spatial space, before transmitting it in the public channel.

As seen from the figure 2, the shuffling is accomplished by :

As a matter of fact, the encryption demonstrated above, took place at the end but it could have been done at the beginning. Giving us an efficient encryption method which can change the location of shuffling.

• Hence there is no constraint as in when to begin or end shuffling process.

3.2.2 Multimedia Big Data Sharing in Social Networks Using Fingerprinting and Encryption in the JPEG2000 Compressed Domain

• This section presents implementation of Tree-Structured Harr (TSH) transformation homomorphically

encrypted domain for fingerprinting using social network analyses with the aim of protecting social media data.

• Fingerprinting Technique basically embeds users fingerprints into digital contents by making changes in host signal in such a way that the content remains the same.

• JPEG 2000 Code Based TSH Transform :

The most commonly used signal processing tool is DWT, wherein TSH is generic wavelet transform.

The fingerprinting plan in the TSH transform of the encrypted data empowers the proprietor to install unique finger print sequence into media content.

Fingerprints embedding is done in encrypted domain using additive Homomorphic property in the selected embedded positions.

Once the discrete TSH transform of the input image is done the JPEG

2000 transformation is divided into 3 sections :

(i) Firstly quantize the wavelet transform coefficients.

(ii) Divide the quantized coefficients into different bit planes and code them through several passes : at embedded block coding with optimized truncation(EBCOT) to give compressed byte stream.

(iii) Lastly arrange the compressed byte stream into separate wavelet packets.

Thus, it is possible to select bytes generated from different bit planes of different resolutions for joint

7

fingerprinting and encryption for JPEG2000 image directly.

• Fingerprints Embedding Procedure :

To shield substance from unlawful redistribution after they are legitimately got by client, interactive media substance ought to be implanted into exceptional client data (e.g., fingerprints) which should be vague to the clients into every client's copy for following unlawful clients.

Since the watermark-embedding procedure is performed in the compressed and encrypted space . For a compacted furthermore, scrambled JPEG2000 pictures, we implant the various hierarchical fingerprints into the compressed and encrypted various leveled subband of the TSH area.

Suppose Nu is a set of users. Choosing encrypted byte stream in approximation subband to combine vector EX I = (ex1, ex2, ..., exI

L)

Choose another encrypted byte stream in all horizontal and vertical subbands to combine different vectors, such as EXO = (ex1, ex2, ..., exO

L) for different community code segment embedding.

Here EX I and EXO is the length of codeword and the user cpdeword encryption equations is :

EFXk = EXOk + α ∗ Fk, k = 1, 2, ...,Nu

Where α is the scale factor and Fk is the fingerprint informationfor user k.

3.2.3 Scrambling

One of the easiest form of encryption that is used for securing multimedia data is Scrambling. It uses encryption process that performs random permutations using some specific pattern to multimedia data. Image’s histogram remains the same, just the individual positions are shuffled. The security that scrambling alone provides is relatively

less. It reduces the efficiency of compression of video bit stream that leads to loss in compression and size increase of the video file.

Scrambling is generally used as a simple way for encrypting live analog/digital video signals such as surveillance camera feeds where complex ciphers are eliminated because it causes delay in computation. Some general scrambling techniques are as follows:

1. Line Inversion Video scrambling: In this technique, whole or some portions of the signal scan lines are simply inverted. This method is comparatively cheap and easy to implement. Its shortcoming is that it provides low security.

2. Sync Suppression Video scrambling: In this method, the vertical/horizontal line syncs are either made invisible or are completely deleted. This method provides a low-cost solution for Encryption and it also provides a good quality of video decoding. A common disadvantage is that the level of ambiguity reached by this technique depends on the content of video.

3. Line Shuffle Video scrambling: In this technique, every signal line on the screen is rearranged. This scrambling method provides better security but it needs a lot of storage for re-ordering the screen.

4. Cut and Rotate Video scrambling: In this technique, every scan line is cut into parts and then combined in a permuted manner. This scheme provides a compatible video signal, gives an excellent amount of obscurity and good decode quality and stability. However, it requires specialized scrambling equipment. Compression algorithms have been designed for the unscrambled signals and they use the statistical characteristics of raw data. Once the signal is scrambled, these characteristics will change and the performance of the compression filter will be degraded.

In Zig-Zag permutation method, in the place of mapping 8 × 8 blocks (as in DCT compression stage in video coder) to 1 × 64 vector in Zig-Zag order, it maps individual 8 × 8 blocks on to 1 × 64 vector by using a random permutation list (secret key). This method consists of three steps.

1. Generating a permutation list of cardinality 64.

8

2. Splitting the coefficients according to list of permutation.

3. Sending the outcome to entropy coding method.

But this method reduces the compression rate of video because the random permutation deforms the probability distribution of DCT (Discrete Cosine Transform) coefficients.

A scrambling scheme of digital image should have a comparatively easy implementation, accommodating low delay operation and low cost decoding for real time interactive applications. It should be free from compression algorithm and should not cause any loss for the compression operation. We present here a case study of technique presented by Liu and Zeng for understanding the scrambling in a better way. Figure below contains an overview of that technique.

Fig. 3: Scrambling Scheme

Initially, the authors transform input signal into frequency domain by using Discrete Cosine Transform (DCT) or Discrete Wavelet Transform (DWT). Then the transform coefficients are divided into further operations which permute their values in the image. Motion vectors are matter of random sign modifications and shuffling. Along with that, a cryptographic key is used to manage scrambling. The motion vectors and scrambled coefficients are then sent to compression block for obtaining compressed bit stream. Users who are authorized can easily obtain the original main content back by using the same compression key. This scrambling operation is executed before compression, thus it allows preservation of properties of multimedia-specific compression like scalability and transcoding. It is easy for frequency domain scrambling to control the transparency (i.e., which portion of the video will be allowed in free access).

Encryption/decryption methods are designed such that it preserves the properties of image transformed that the entropy coders are allowed to properly compress an image.

Apart from secure and easy transcoding, framework of joint scrambling-compression provides many different advantages over those that execute scrambling on the compressed bit streams. Those advantages are:

1. Flexibility to encrypt selectively: In domain of frequency, it’s very easy to recognise which data parts are crucial for security. This permits to provide various levels of security and transparency.

2. Encrypting incompressible segments: It’s simple to detect which data parts are incompressible. For instance, coefficient sign bits are generally hard to compress; but still those are important for security purposes. This incompressible segment of data is selected for scrambling without impacting the total efficiency of compression. Data segments like motion vector information are generally compressed without any loss. Hence they can be selected to encrypt without the necessity of considering the transcoding issue, since this doesn’t makes sense for transcoder to recode the portion of compressed data. The selected data can easily be located in the frequency domain without causing any processing overhead. On the other side, since the compressed bit stream is generally variable length coded, it is usually difficult to perform fine scale selective encryption on compressed bit stream without causing bit overheads and processing. 3.Low vulnerability to channel errors: Encrypting data after compressing such as using AES over MPEG is more vulnerable to channel errors because a block of 128 bits in AES is bound together so that a single bit error in a block will cause synchronization word/bits of that block to be erroneous. It will be harder to recover from transmission errors in the network since the synchronization information is hidden in the encrypted video stream. On the other hand, spatial scrambling in frequency domain has no such adverse effect on the resiliency of error. 4.Compatibility to transform domain signal processing: Scrambling consists of changing the spatial parts of individual frequency coefficients. Watermarking and other transform domain tasks can

9

be easily performed without requiring a cryptographic key. Some of the techniques for scrambling are as follows: 1. Selective Bit Scrambling: The basic method scrambles the bits that are selected in transform coefficients for encrypting the image. Each bit of coefficient can be one of the three types. Significance bits for the coefficient are most significant bit of the value 1, and any of the preceding bits of the value 0. This constraints the magnitude of the coefficient to known range. Remaining magnitude bits are refinement bits that are used in refining the coefficient within a recognizable range. Sign bit says if the range is negative or positive.

2. Block Shuffling: In block shuffling, we divide every sub-band into multiple similar sized blocks. The block size can be different for various sub-bands. Within every sub-band, coefficient blocks are shuffled according to the table of shuffling which is generated using key. The table of shuffling generally will be distinct for various sub-bands, and it can differ for every frame.

3. Block Rotation: Every coefficient block is rotated to form encrypted blocks to improve security.

3.2.4 Post-compression Encryption Algorithm

The Secure Real-time Transport Protocol, or the basic approach encrypts the compressed bit stream by packetizing the multimedia data and then encrypting every single packet using AES. It is secure but it has a huge computational overheads and also it is not conducive to different favourable properties of the compressed bit streams in general, owing to the encryption of compressed data.

So many different algorithms have been proposed—which are format compliant, or have low computational requirements. Gadgegast and Meyer suggested a selective video encryption scheme called Secure MPEG or SECMPEG for the MPEG-1 video coding standard. See figure below for details.

Fig. 4: Different levels of security offered by the SECMPEG algorithm (Meyer and Gadgegast).

It offers various levels of security by encoding different parts of compressed bit stream:

Algorithm 1: This encrypts headers from layer of sequence to layer of slice.

Algorithm 2: Along with that, it encrypts DCT coefficients of low frequency of each block in I-frames.

Algorithm 3: This encrypts complete I-frames and I-blocks in the B- and P-frames.

Algorithm 4: This encrypts complete MPEG-1 sequence with the basic process.

The approach has some notable limitations: Computations savings are not so significant because I-frames constitute 30%–60% of a MPEG video. Moreover, Agi and Gong demonstrated that some scene contents are still visible by directly playing back the selectively encrypted video stream on a normal conventional decoder. Maples and Spanos presented a similar approach called AEGIS. All I-frames in an MPEG-video stream are encrypted, while P- and B-frames are just left unencrypted. The AEGIS algorithm is almost same as SECMPEG level 2.

10

Fig. 5 The Video Encryption Algorithm proposed by

Qiao and Nahrstedt. MPEG packets are shuffled using key information for fast, efficient encryption

Qiao and Nahrstedt introduced this algorithm called Video Encryption Algorithm (VEA) which reduces the computational complexity to almost half. This video encryption algorithm is detailed in Fig above. Half bit stream is encrypted with a naive encryption algorithm such as AES and it is then used as key to XOR with the other half bit stream. The basic VEA algorithm is so vulnerable to plaintext attacks because an attacker can recover the whole frame from the knowledge of either the odd list or the even one. A 2n-bits random key (KeyM) is used to split the 2n-byte chunk randomly into two lists instead of the fixed odd-even pattern in the basic VEA. Thus, VEA also results in increased key management issues.

3.2.5 Pre-compression Encryption Algorithm

While encrypting the video content before compressing is possible, it also has some very serious limitations that are crucial for mobile devices:

1. Pre-compression encryption implies encrypting uncompressed or raw bits which will waste a lot of computational resources.

2. Output of encryption is generally a random bit stream with absence of repetition, resulting in compression operation highly inefficient for general case. For instance, lets consider encrypting one specific HD video of a simple resolution of 480p (852 × 480) with Advanced Encryption Standard. It will need 2.3 Million cycles of AES/sec for encoding (and then for decoding) that video on portable device. Also, the compression resultant will mostly be missing as AES output bits is genrally random with almost no possibility of compression without loss.

A famous example is work of Diplin and Pazarci. The scrambler is clear to MPEG-2 compression. Before coding the video, it encrypts it in the BRG (blue, red, green) color space by implementing four hidden linear transformations. This method manages the efficiency of compression of video codec but it has been found to be risky in brute force attacks.

Fig. 6: Pre-compression encryption scheme proposed by Pazarci and Diplin

3.2.6 Selective Encryption

• Selective Encryption algorithm is a hybrid of Arnold’s Cat map, image hiding and modified IDEA technique.

• This method involves encryption of a subset of a data. The main goal of this method is to

11

reduce the amount of data to be encrypted meanwhile, maintaining the same level of security.

• Selective Encryption is better practice in constrained communication for eg real time networking, which helps in achieving scalability.

Before we study the actual algorithm we should actually know few things to understand the algorithm better:

1. Arnold’s Cat Map(ACM): it is named after a Russian scientist who used a cat’s image to invent this method.

The image is hit with a transformation which randomizes the organization of the pixels. This is depicted in Fig 7.

Fig 7: Arnold’s map permutation process

Hence the pixels will degenerate into unintelligent order to produce encrypted image.

2. Modified IDEA Cryptography Technique :

The original IDEA technique is time consuming, hence we modify it by

changing the 25-bit shift to 16-bit shift as shown in Fig 8.Rest complete method remains the same.

Fig 8: Key generation for modified IDEA

• Selective Encryption is carried out in following 3 steps :

1. Apply hiding technique to complete image.

2. Apply the ACM to the entire picture.

3. Apply Modified IDEA to the resulting image body.

12

Fig 9: Block Diagram for Selective Encryption Technique

3.2.7 Joint Video Compression and Encryption (JVCE) Approach

The principle thought behind joint coding is to incorporate encryption into compression operation by parameterization of the compression blocks, and not adjusting the compressed bits. Entropy Coding and Wavelet Transform are the two main compression blocks where this approach has been applied.

JVCE combines both encryption and compression into one single operation making it easy for embedded devices and mobiles to ensure security of multimedia with its low budget of power. By integrating compression and encryption operations as a whole, JVCE approach reduces the latency of encryption which is useful in delivering real-time videos. This approach generally does not change the compressed bit streams, but it changes the way compressed bit stream is acquired. This integration permits exploiting the hierarchical representation of signal in a transform domain, as used by most video and image compression techniques, for providing the advanced functionalities needed by numerous modern applications. The ISO/IEC JPEG 2000 Part 9 (JPSEC) standard shows how security and comparison can coexist and take advantage of each other.

JVCE scheme has emerged as a totally new paradigm of encryption without proper encryption of video contents which gives them advantages in terms of computations, mobility, and friendliness to post-compression operations. However, we observed that it has been broken particularly against known-plaintext attacks. For designing an efficient encryption key for the mobile applications, we suggest that we should develop JVCE algorithms for different video coding blocks and then efficiently integrate those into a common framework. An efficient integration will invalidate most of the cryptanalysis and thus the combined system will give a higher degree of security than any other currently existing ciphers. Including some efficient scrambling operations into this design is meant to complicate input–output relationships at various levels.

3.3 Comparison of Algorithms

4. Multimedia Big Data Systems and Frameworks

The same structure as Chapter 3 will be followed. After classifying the systems and frameworks, they will be explained and reviewed. Then the comparison of these systems will be made.

4.1 A Classification of the Systems and Frameworks

13

4.1.1 The Framework for Authorization Mechanism for Multimedia Big Data

This framework is a hybrid solution which aims to control multimedia big data sharing policies. It offers a solution for privacy policy composition and enforcement for online multimedia data. The mechanism lets users(multimedia big data owners), not administrators to compose their own policy for their multimedia big data while ensuring the logical consistency.

Fig. 10: Functional view of Authorization Mechanism Framework.

4.1.2 HuaVideo: Secure HTML5 Video Providing System

HuaVideo is a video server system that is designed to provide substantial capacity to big HTML5 video data and protect the content[17]. It uses Distributed Database for video storage. The system ensures that the server content can't be downloaded by typical means.

4.1.3 Big Data Privacy via Hybrid Cloud

With the increasing amount of medical systems, surveillance systems or social networks it is getting more and more difficult to store and manage these data in a cheaper way. Cloud computing is the best solution to this issue since it uses provisioning on demand mechanism and method of pay-per-use.

4.2 A Review of the Systems and Frameworks

4.2.1 The Framework for Authorization Mechanism for Multimedia Big Data

The framework for authorization mechanism for multimedia big data offers a solution for security policy for huge multimedia big data systems which have large number of devices and users. The objective of the framework is to let users to exchange data with users whom they authorize and allow them to manage their private multimedia data in terms of access and management in a secure system. Especially in the healthcare domain this system is crucial, because the framework allows patients the ownership of their health records which is enforced by law.

Since the amount of data and the number of users are high in these systems, the structure is complex and there are some design requirements that should be applied to keep the system in control.

14

• The solution should be location-aware, context-aware and real time. These benefits ensures mobility.

• The security policy shouldn’t have high complexity. Every user will be able to compose their own individual policies, so it is assumed that they can be naive.

• The data must be collected in an organized manner in compliance with security policies.

• Requests for data access should specify the details and intentions for how the data will be used.

The framework also uses an improved Role Based Access Control(RBAC) model called Generalized Spatio-temporal Role Based Access Control(GST-RBAC) to handle the context requirements for multimedia big data security. RBAC has four parts consisting of a set of users, permissions, sessions and roles. The improved model grants specification of temporal and spatial constraints on role enabling, user role assignment, role-permission assignments [5].

The mechanism is designed based on Secure User Data Repository System (SUDRS) [5] that is a hybrid approach for composition of privacy rules for user data. The solution has four components with some databases. The system architecture can be seen in Fig. 11 and the functional view of those components are shown in Fig. 10.

Fig. 11: System architecture for Authorization Mechanism Framework.

The system can be explained with a basic scenario. The owner of data(user)’s primary doctor is allowed to see the medical records of the user. The doctor wants to consult another doctor for an opinion. But, the privacy policy of the patient may not allow to share information that is related to identity other than the primary doctor. In this situation, the system builds a filtered view of the original version and the private information such as date of birth, full name are changed. The original and the filtered data are kept in different databases. These disclosure rules that controls the access are defined by the owner of data, consumers(entities that want to access data) and contributor(originators of the generated data). The most critical parts that allow system to maintain data security are:

• Composition and verification of originator disclose rules, general disclosure rules and access rules,

• Composition and Verification of Originator Disclosure Rules

• Composition and Verification of Disclosure Rules

• Enforcement of Disclosure Rules

15

Fig. 12: Pseudo-code for access rules verification and composition.

Fig. 13: Pseudo-code for Originator Disclosure Rules verification and composition.

The first component as shown in Fig. 12. lets consumers to define their access rules. After the personal information of the customer is entered, the consumer is authenticated. If this step is successful, the consumer defines the access rules. In the last step, the access rules are verified to check if they are consistent.

Process for Composition and verification of Originator Disclosure Rules is shown in Fig.13. In this part the disclosure rules that are composed by originator is uploaded to the system. The data type could be image, video or text. A list of corresponding users, roles, location and time is created by the contributor. The composition is verified as it’s done in the first step.

Fig. 14: Pseudo-code for Disclosure Rules verification and composition.

Fig. 15: Pseudo-code for Disclosure Rules enforcement.

Third component is for disclosure rule composition. The steps for this process can be seen in Fig. 14. In this component the owner selects with which consumer the data can be shared. The user also has the opportunity to select time and location.

The last component enforces the disclosure rules by checking the rules that are held in the data repository system and compares these rules with the context parameters.

4.2.2 HuaVideo: Secure HTML5 Video Providing System

HTML5 videos are getting more popular everyday since they don’t require an additional plug-in like Flash videos and the browsers has an integrated player to show these type of videos. However there are some challenges about the security and storage of HTML5 video big data :

• The security is an issue for HTML5 videos since the video URL can easily be found in source code. This makes it very easy to download the video.

• Videos also tend to occupy more space than other multimedia big data types. For that reason video server capacity has to be large.

• Storage should be scalable, system administrators should be able to add more servers securely.

HuaVideo System address all these issues[17]. It is a video server system that offers scalable big video data storage and content security.

As expected HuaVideo doesn’t use basic HTTP/1.1 authentication system. This procedure sends the data in clear text which is very easy to capture. Instead of

16

this Digest authentication mechanism is used to transfer the private user data in an encrypted form.

Overview of the Architecture:

In order to accomplish the objective of building a secure and scalable video big data server, there are some design standards that HuaVideo follows:

• In order to extend the server capacity effectively, distributed storage strategies are applied.

• User has to be authenticated when he/she tries to access to server.

• An encrypted URL is generated to ensure each link can be accessed once.

• Multiple servers are placed to make video content access faster.

• Checking the header of HTTP request shouldn’t be enough to download the video.

Fig. 16: Architecture overview of HuaVideo.

Fig. 17: Procedure of Access Control of HuaVideo.

The high level overview of the architecture can be seen in Fig 17. There are numerous web servers that are connected to database server clusters. Users access to the content through the closest web server. The load balancer is responsible for this allocation.

User Access Procedure:

This procedure is shown in Fig. 17. First the client should sign into the server. If the result of the authentication is OK, the access is given to the client. In case the client sends the URL to someone else, the person who has the link will get an error. Downloading the video is only limited to header. The user won’t be able to get the other parts of the video.

Storage Architecture:

17

DDBS(Distributed Database System) is run on each server. DDBS stores every data, including user info, comments and the actual video files.

The network has the Peer-to-Peer(P2P) architecture which allows administrators to add or remove a server at anytime.

Implementation:

• HuaVideo’s data services provider is MongoDb. Database systems like MySql were mainly modeled for smaller volume of data. They were also not modeled for single server structure[18]. MongoDb is a database system that is designed for big data and solves these issues. It’s query procession is fast and it’s service is scalable. MongoDb has a plug-in called GridFs that divides each file into smaller pieces and put these smaller data on different servers. This is perfect for multimedia big data since they usually have a large volume.

• Script language of the system is PHP(version 5.5).

• OGG and MPEG-4 are the video formats.

• Memcached is used to store and clear URLs.

• Browsers and servers are informed by HTTP headers which carry a lot of information. HuaVideo uses the header to prevent users to download the video. HTTP request header has a Range field that gets the fragmentary content. HuaVideo ensures that download software doesn’t send a request that contains Range field to make the content non-downloadable.

4.1.3 Big Data Privacy via Hybrid Cloud[12]

Cloud Computing is a low cost and more attractive solution to big data. However the security of data stored in public cloud is still a concern.

Firstly, multimedia big data of different organization contain sensitive data.

Secondly, Cloud Service Providers (CSP) have access to that sensitive data stored in public cloud.

The basic idea behind using Hybrid Cloud is to separate sensitive data and non-sensitive data and

storing them in trusted private cloud and untrusted public cloud respectively. But, then this would lead to the need of large amount of storage private clouds.

Hence, we study this framework to utilize hybrid cloud as much as possible with minimum cost and storage space.

Methodology of Big Data Privacy using Hybrid cloud:-

● System and Threat Model : Figure 18 indicates an architecture of Hybrid Cloud. Original data comes from private cloud and is processed on to server within the private cloud.

The data is directly sent to public cloud if it is not sensitive. Else process the remaining sensitive data. Once the processing is done all the data is passed to public cloud and only a small amount of sensitive data is stored in private cloud.

Whenever data is needed both public and private cloud are contacted for the same.

Fig. 18: Architecture of Hybrid Cloud

● Goals of the designBasic design goals are to achieve data privacy as well as reduce the following:

1. Quantity of data inside private cloud2. Communication overhead in between public

and private cloud3. The deferral presented by interchanges in

between private and open cloud18

● Privacy of Image DataThis section describes about achieving Image Data privacy. The following table represents the notations for the same.

TABLE 2: Notations for image data privacy

● Step 1: Divide image into blocksWe partition a large picture (size of N × N) into n number of blocks, where every piece has the same size k × k. For instance, an image has a size of 256 × 256, the measure of every piece (k ×k) is set to 32×32. The picture is partitioned into n = (256 ÷ 32) × (256 ÷ 32) = 64 pieces.

● Step 2: Mapping FunctionUse one-to-one mapping function to produce a ciphered value from original pixel value

P’ = module(m x p,n) + [p/(n/g)]

Figure 19 shows original and encrypted image as an example.

Fig. 19: Original and Encrypted Image

● Step 3: Random Shuffle of BlocksTo make the modified image unrecognizable just shuffle the image blocks by bunching N blocks into multiple groups randomly each of size M.

● Step 4: Image RecoveryAt the point when a picture is requested, the solicitation is sent to both the private cloud and open cloud in the meantime.

1. First, we generate shuffle order permut using Algorithm 1.

2. Second, we re-arrange shuffled image blocks from the public cloud, which gives us transformed image.

3. Third, we get random values from private cloud, and then utilize them to recover the original image from the transformed image.

19

Fig. 20: Obtaining Shuffle Order Algorithm

4.3 A Comparison of the Systems and Frameworks

Table 3: Comparison of the Systems and Frameworks

5. Multimedia Big Data Security Challenges

Multimedia big data has huge volume and usually stored in multiple different places. While providing constant access to this data and managing the massive amounts of volume, keeping the system secure could be a daunting task. There are a lot of multimedia big data security challenges as can be seen in Fig. 21. that are about maintaining a secure access to users and protecting the data from corruption and online attacks[13].

20

Fig. 21: Multimedia Big Data Security Challenges

Multimedia big data has huge variety. This amplifies the challenges that are already addressed in traditional security[14] and makes access management harder. Different data sources have different access restrictions. As a result balancing the security for the data sources with the necessity to extract meaning from the multimedia data becomes a daunting task.

Multimedia big data systems are typically distributed. These distributed environments are

much more complex and prone to attack than single database servers. Since the systems are distributed geographically there is a need for physical security controls[14]. There are also a huge number of servers and this raises the chances of inconsistency of the server configurations. Inconsistency makes the environment vulnerable.

Organizations and companies need to guarantee that the utility of the information and privacy are balanced[15]. Prior saving the data in database, organizations make sure that the data is anonymized enough. For example, unique identifier for clients should be removed. But, a new challenge arises here. It might not be sufficient to ensure that the information will stay anonymized even after removing special identifiers. If cross-reference technique is used on anonymized information with other accessible information, it could be possible to de-anonymize the information.

Establishing ownership of data is a noteworthy challenge while using multimedia big data. Storing the multimedia big data in the cloud is a widely used way. This requires an establishment of a trust boundary between owners of the data and data storage[15].

There are a lot of big data tools. The design of these tools may lack security features such as encryption of information that is sent between nodes and authentication. It’s also not easy to define a security policy while using these tools[14]. The issue of encryption is a huge challenge when storing multimedia big data.

21

Multimedia big data is a relatively new concept in the industry, so the best practices are not fully known yet[15]. New security problems arise everyday, therefore there is a need for more research to make the systems more secure.

6. Conclusion

In this paper, we presented seven different encryption/decryption algorithms and three frameworks/schemes about multimedia big data security. They were thoroughly classified, reviewed and compared. Since the best practices for multimedia big data security are not fully known yet and there are not many frameworks, tools or algorithms, there is a need for more research about this issue.

7. References

[1] J. Gao, S. Li, T. Zhang and Y. Park. “A Sticky Policy Framework for Big Data Security”, The First IEEE International Conference on Big Data Computing Service, and Applications, 2015.

[2] C. Ye, Z. Xiong, Y. Ding, J. Li, G. Wang, X. Zhang and K. Zhang. “Secure Multimedia Big Data Sharing in Social Networks Using Fingerprinting and Encryption in the JPEG2000 Compressed Domain”, IEEE 13th International Conference on Trust, Security and Privacy in Computing and Communications, 2014.

[3] W. Zhu, P. Cui, and Z. Wang. “Multimedia Big Data Computing“, IEEE Journals & Magazines, vol. 22, no. 3, pp. 96 - c3, 2015.

[4] S. Agaian and O. Caglayan. “Fast Encryption Method Based on New FFT Representation for the Multimedia Data System Security”, IEEE International Conference on Systems, Man, and Cybernetics, pp. 1519 - 1524, 2006.

[5] A. Samuel, M.I. Sarfraz, H. Haseeb, S. Basalamah and A. Ghafoor. “A Framework for Composition and Enforcement of Privacy-Aware and Context-Driven Authorization Mechanism for Multimedia Big Data”, IEEE Transactions On Multimedia, vol. 17, no. 9, pp. 1484-1494, 2015.

[6] C. Tankard. (2012, July) Big data security [Online]. Available: http://www.sciencedirect.com/science/article/pii/S1353485812700636

[7] M. Chen. “A Hierarchical Security Model for Multimedia Big Data”, International Journal of Multimedia Data Engineering and Management, vol. 5, issue 1, 2014.

[8] “Big Data, Big Security Challenges” [Online]. Available: http://www.vormetric.com/data-security-solutions/use-cases/big-data-security

[9] M.A. Hossain. “Framework for a Cloud-Based Multimedia Surveillance System”, International Journal of Distributed Sensor Networks, vol. 2014, 2014.

[10] A. Pande and J. Zambreno. “Algorithms for Secure Multimedia Delivery over Mobile Devices and Mobile Agents” [Online]. Available: http://web.cs.ucdavis.edu/~amit/BookChapter/Algorithms%20for%20secure%20multimedia%20delivery%20over%20mobile%20devices%20and%20mobile%20agents.pdf

[11] M. Chen, S. Mao and Y. Liu. “ Big Data: A Survey”, Mobile Networks and Applications, vol. 19, issue 2, pp. 171-209, 2014.

[12] X. Huang and X. Du. “Achieving Big Data Privacy via Hybrid Cloud”, IEEE INFOCOM Workshops: 2014 IEEE INFOCOM Workshop on Security and Privacy in Big Data, pp. 512 - 517, 2014.

22

H. Zhao. “A Novel Video Authentication Scheme with Secure CS-Watermark in Cloud”, IEEE International Conference on Multimedia Big Data, pp. 358-361, 2015.

[13] “Big Data Security Challenges” [Online]. Available: http://www.forbes.com/sites/emc/2014/02/03/big-data-security-challenges/

[14] “How to Manage Big Data’s Big Security Challenges” [Online]. Available: http://data-informed.com/manage-big-datas-big-security-challenges/

[15] “Big Data Security - Challenges & Solutions” [Online]. Available: https://www.mwrinfosecurity.com/articles/big-data-security---challenges-solutions/

[16] “ Cloud Computing Experts Detail Big Data Security and Privacy Risks” [Online] Available: http://data-informed.com/cloud-computing-experts-detail-big-data-security-and-privacy-risks/

[17] Y. Wu and Y. Zhang. “HuaVideo: Towards a Secure, Scalable andCompatible HTML5 Video Providing System”, 2014 11th Web Information System and Application Conference, pp. 81-85, 2014.

[18] “What Is Big Data?” [Online]. Available: https://www.mongodb.com/big-data-explained

[19] R. Ridzon and D. Levicky. “Multimedia security and multimedia content protection”, 51st International Symposium ELMAR, 2009

[20] S. Byun and B. Ahn. “More on the multimedia data security for e-commerce”, Information Technology: Research and Education, pp. 412-415, 2003

[21] K. Nahrstedt, J. Dittman and P. Wohlmacher. “Approaches to Multimedia and Security”, IEEE International Conference on Multimedia and Expo, pp. 1275-1278, 2000

23

survey project-1

Documents