[ieee 2012 7th international conference on electrical & computer engineering (icece) - dhaka,...

DNA Cryptography Sabari Pramanik1,*, Sanjit Kumar Setua2

1Department of Computer Science Vidyasagar University, Midnapore, 721102, West Bengal, India

2Department of Computer Science University of Calcutta, 92, A.P.C.Road, Kolkata-9, West Bengal, India

*[email protected]

Abstract - DNA cryptography is a new promising field in cryptography which emerged with the progress of DNA computing. The concept of massive parallelism and large information density inherent in DNA molecule are exploited for cryptographic purposes. Currently, the main difficulties of DNA cryptography are the requirement of high tech biomolecular laboratory and computational complexity. In this paper, a new parallel cryptography technique is proposed using DNA molecular structure, one-time-pad scheme and DNA hybridization technique which certainly minimizes the time complexity.

Index Terms – DNA Cryptography, Single-stranded DNA, double-stranded-DNA, Hybridization.

I. INTRODUCTION

As some of the modern cryptographic algorithms (such as DES, and recently, MD5) are broken, new directions of information security are being required to protect the data. The concept of using DNA computing in the field of computer security is a possible technology that may bring forward a new hope for powerful or even unbreakable cryptography algorithms.

In his pioneering work, Adleman build the base for the new field of biomolecular research [1]. The main idea was to use actual chemistry to solve problems that are either require enormous amount of computation or unsolvable by conventional computers. This approach has been extended by Lipton to solve another NP-complete problem which is the satisfaction problem [2]. These opened a new way of solving many problems in different fields, including cryptography.

The main advantage behind the DNA molecular structure is its vast parallelism, exceptional energy efficiency and extra-ordinary storage capacity [3]. One gram of DNA can store about 108 tera bytes. But in spite of showing bright future towards cryptography, it is confronted with some drawbacks such as requirement of huge computing time, high computational complexity and high tech biomolecular laboratory. In general, existing cryptography techniques use modern biological technologies as implementation tool and DNA as information carrier. These biological technologies include PCR amplification, synthesizing message DNA strands, hybridization etc. These methods are costly, complex and need enormous amount of time. Unlike common DNA

based cryptography techniques, in this paper we investigate a new parallel DNA cryptography technique using DNA molecular structure and hybridization technique which certainly minimizes the time requirement. The encryption method generates the ciphertext from the plaintext using one-time-pad scheme in a reverse way, and the position of binary 1s in modified version of the plaintext. The decryption process makes use of randomly received individual packets and hybridization technique.

The research of DNA cryptography is still at its initial stage, and there are many problems yet to be solved. It started with Viviana Risca’s Computer Science project on DNA Steganography, winner of “Junior Nobel Price”, 1999-2000 edition, which is proposing “Hiding messages in DNA microdots” [4]. Though many researchers are working on this field it is far from maturity both in theory and realization [5], and this might be the reason why only a few algorithms on DNA cryptography were proposed till now. Again, on the other hand, currently DNA technology is based upon the modern biological technologies which are extensively laboratory dependent. There is not any specific general theory about applying DNA molecules into cryptography [6]. Some biological technologies such as Polymerase Chain Reaction (PCR), DNA Hybridization, DNA Synthesis, DNA digital coding are common techniques among recent literature on DNA cryptography algorithms.

This paper is organized as follows: Section II presents the design rationale, a formal description of the proposed method along with an example is described in Section III, and finally section IV concludes this paper.

II. DESIGN RATIONALE

A. DNA

Deoxyribonucleic Acid (DNA) is the hereditary material of almost all living organisms ranging from very small viruses to complex human beings. It is an information carrier of all life forms. DNA is a long polymer of small units called nucleotides. Each nucleotide consists of three components:

i. A Nitrogenous Base ii. A five carbon Sugar

iii. A Phosphate Group

2012 7th International Conference on Electrical and Computer Engineering20-22 December, 2012, Dhaka, Bangladesh

551

978-1-4673-1436-7/12/$31.00 ©2012 IEEE

There are four different nucleotides depending upon the type of nitrogenous base they have got. There are four different bases A, C, T, G called Adenine, Cytosine, Thiamine and Guanine respectively.

DNA is a double helical structure with two strands running anti parallel as shown in Figure 1. DNA stores all the huge and complex information about an organism with the combination of only these four letters A, C, T and G. These bases form the structure of DNA strands by forming hydrogen bonds with each other to keep the two strands intact. A forms hydrogen bond with T whereas C and G forms bond with one another.

Figure 1: Double Helical Structure of DNA

In this paper we denote an individual strand as single-stranded DNA (ssDNA) and a double helix as double-stranded DNA (dsDNA). An ssDNA, under certain condition, can form dsDNA with other ssDNA strands which are complementary

Figure 2: Hybridization a) Base-pairing cannot go further because sequences have different bases, pairing is unstable and the strands come apart. b) Base-pairing continues because sequences are complementary.

with each other. This process is called Hybridization because the double-stranded molecules are hybrids of strands which come from different sources. The initial phase of hybridization is a slow process because there is a random chance that a region of complementary strands will come together to form short sequence of correct base pairs (Figure 2). The initial pairing is followed by a rapid matching of the remaining complementary bases.

B. DNA Cryptography

DNA cryptography is a new born cryptography technique in which DNA is used as information carrier and the modern biological technology is used as implementation tool. The vast parallelism and extra ordinary information density that are inherent in DNA molecules are explored for all sorts of cryptographic techniques.

Now a days, the field of biology and that of cryptography have come to combine. The study of DNA can be applied in DNA cryptography systems that are based on DNA and one-time-pads, and if it is used correctly, it is virtually impossible to crack the system. The size of one-time-pad depends on the cryptographic system. There are various procedures for DNA one-time-pad encryption schemes [6] [7].

From the cryptographic point of view, DNA is very powerful. The binding capabilities of nucleotide bases (A-T, G-C) offer the opportunity of creating self-assembly structures that are excellent means of executing computations. Another advantage is that DNA has a huge storing capacity, but on the other hand, practical implementation of DNA cryptography requires a lot of time and resources. It also has several computational limitations. Thus, the efficient use of DNA cryptography is still difficult from a practical point of view [8].

III PROPOSED METHODS

We now describe the proposed method, whose security is mainly based upon the difficult biological problem such as DNA hybridization and one-time-pad scheme. This method also uses a parallel technique to decrypt the message which certainly minimizes the time taken for decryption. We will show the way of exchanging message safely between sender and receiver. We call the sender as Alice and the intended receiver as Bob. We extend the definition of encryption and decryption as follows:

Suppose there is a sender Alice who owns an encryption key (one-time-pad) KA, and it sends to intended receiver Bob. Alice uses KA to translate a plaintext M into ciphertext C by a translation E. Bob uses KA to translate the ciphertext C into plaintext M by a translation D.

The Encryption process is: C = EKA(M) (1)

552

The Decryption process is: DKA(C) = DKA(EKA(M)) = M (2)

It is difficult to obtain M from C unless one has KA. We call translation E as encryption process, translation D as decryption process and C as ciphertext. Here KA and C are not limited to digital data, but can be any method, material, data, DNA sequence etc. E and D are also not limited to mathematical calculations, but can be any physical or chemical or biological or mathematical process.

We now describe the general process of the proposed cryptography technique.

A. Key generation

One-time-pad is the simplest and most secure type cipher. A one-time-pad cipher uses a key or pad that is randomly chosen for each encipherment. There is no way that an adversary can guess the key. Alice sends the key to intended receiver Bob through a secure channel prior to the encryption process. Then sender Alice encrypts the message using the key and then destroys the key. Bob uses this key for decryption and destroys this key. Next time a new key (pad) is used for encryption and decryption.

In this paper we use the one-time-pad key as a randomly generated single-stranded DNA (ssDNA) string. The message-sender Alice generates a totally random pseudo ssDNA string and transmits it to Bob using a secure channel. An original DNA chromosome from any organism is not used here, because any real chromosome can be recognized by an adversary from the encrypted message.

The length of the key (ssDNA string) depends on the length of the plaintext message which will be transmitted. If we use n-mer oligonucleotides to encrypt one plaintext message bit then the size of ssDNA key must be n-times of the size of binary plaintext message M’.

B. Encryption

In our proposed encryption method the randomly generated ssDNA is used as key KA. The ssDNA key is scanned in reverse order from sequence end towards the beginning of the key.

First of all, the message-sender Alice will translate the plaintext M into corresponding ASCII code. Then this ASCII code is translated into binary plaintext M’. Finally, Alice translates the binary plaintext M’ into several packets of DNA ciphertext Ci̕ (for i=1 to m) by using the key KA and the following algorithm.

Alice scans the plaintext M’ from left to right and the key KA from right to left. For a ‘1’ in the plaintext M’ a set of 10-mer

oligonucleotides (the number of oligonucleotides used to encrypt the plaintext M’ is variable, but must be fixed for encryption and decryption process. Here Alice fix it at 10) starting from the sequence end of the key KA is selected and Watson-crick complementary of this 10-mer oligonucleotide substring is taken. This is the DNA ciphertext Ci̕ . Then Alice generates a sequentially generated decimal number i and attaches this number at the end of the complemented substring as packet number. Alice then sends this packet Ci̕ to intended message-receiver Bob using an open, insecure communication channel.

For a ‘0’ in the plaintext M’ Alice performs no operation, so no packet is being generated and sent for a ‘0’.

Thus, the encrypted message C’ is a set of some DNA ciphertext packets Ci̕ (i=1 to m, m is the number of 1s into the binary plaintext M’). Each packet contains 10-mer oligonucleotides complementary to key KA, along with corresponding packet number, for a ‘1’ in the plaintext binary message M’.

C. Decryption

After getting the DNA ciphertext packets from Alice, the intended receiver Bob takes the Watson-crick complementary of the packet content and gets the 10-mer oligonucleotide substring. The packet number which is attached with the packet determines the probable starting position for matching the ssDNA substring with the ssDNA key, starting from sequence end. In this way the proposed decryption algorithm will be able to minimize the searching time. Bob does the whole decryption process in parallel in a parallel computing environment. As soon as Bob gets any ciphertext packet from Alice, he assigns it to a processor to evaluate each packet and finds out the substring’s exact matching position in the ssDNA key. Each matching position denotes a ‘1’. After evaluating all packets the unmatched positions of the ssDNA key are represented as 0. Bob gets the binary plaintext message M’ in this way. Then Bob converts it into ASCII code and then gets the actual plaintext M.

In the following section, we thoroughly discuss details of this encryption and decryption scheme with an example.

Step 1: Key Generation.

We choose the plaintext message as “DNA”, so the length of M is 3 and the length of M’ is 21. After getting the plaintext message message-sender Alice generates a totally random 210-mer long oligonucleotides ssDNA string, that is,

AGATAGTCATACGTACGACTAACACAGGCATTCACCATGGAACAGCGGTTTCCGTAACATCCTGAGTCCAATAGCGATCCGCTATTCCGTATTTATAAGGGGTCTGATCTTGCATCCGGCATATAGAGCCGATAGGCAGTATGGG

553

GAGTAAGCCGGACCAAGGTCAAATAGGGATTCATCGCCTGATTCCAAAGAATTTGCCGCGCTTCC

which acts as one-time-pad key. Then Alice sends the key to intended-receiver Bob using a secure communication channel.

Step 2: Encryption.

We first convert the plaintext “DNA” into its equivalent hexadecimal ASCII code, that is, “44 4E 41”. Then we translate the ASCII code into binary plaintext M’, that is, “1000100 1001110 1000001”. Then to encrypt the plaintext binary message M’ Alice scans the message M’ from left to right, and the one-time-pad key KA from right to left. As the first bit in M’ is 1, Alice selects 10-mer oligonucleotide from the sequence-end of one-time-pad key and takes its Watson-crick complementary, i.e. GGCGCGAAGG. Then Alice generates a sequential integer number and attaches to this DNA ciphertext sequence, and sends this DNA ciphertext packet containing “GGCGCGAAGG0” to Bob using an open communication channel. In this way Alice generates all packets for each 1 in plaintext M’, i.e. 8 packets are generated and sends to Bob using an open communication channel. For a 0 in binary plaintext M’, no packet is generated and send to Bob. Bob will get the following packets:

GGCGCGAAGG0, TCCAGTTTAT1, CTATCCGTCA2, CCAGACTAGA3, TAAATATTCC4, CGATAAGGCA5, GGACTCAGGT6, TCTATCAGTA7

Step 3: Decryption.

After the intended receiver Bob gets the DNA packets, he can easily pick out the secret-message DNA sequence by using DNA hybridization technique. As soon as Bob gets a packet from Alice he finds out the packet number from the end of the packet and gets the tentative starting position of the packet content. He first takes the complement of the packet content and gets a 10-mer oligonucleotide sequence. For example if Bob gets the first packet as “TCCAGTTTAT1”, he finds out the sequence number as 1. Then Bob starts to hybridize the position of this oligonucleotide sequence into the one-time-pad key. The searching starts from sequence end of the one-time-pad. Since this is the 2nd packet, Bob starts from 11th position (packet number * 10). As he finds a matching of the oligonucleotide substring into the one-time-pad, he places a ‘1’ in that position. In this way Bob finds out every ‘1’. All the unmatched positions are replaced by ‘0’. In this way Bob finds out the secret binary plaintext M’. After the binary plaintext M’ has been recovered, Bob can retrieve the binary plaintext M, “DNA” from the binary plaintext M’.

D. Method Analysis

Regardless of the many differences between DNA cryptography and traditional cryptography, they both satisfy the same characteristic of cryptography. The security

requirements should also be founded upon the assumption proposed by Kerchoff that security should depend only on the secrecy of decryption key. That is, it must be assumed that an adversary knows the encryption algorithm. The only thing not known by adversary is the key. In this scheme, the encryption and decryption key KA is ssDNA one-time-pad.

The security of this encryption scheme entirely depends on the one-time-pad key. As each pad is randomly generated and used only once, no adversary can guess it. For example, to encrypt the word “DNA” a 210-mer long random ssDNA one-time-pad key is generated. To find the exact one-time-pad the adversary must search among 4210 different ssDNA string, which is more than impossible for adversary.

In the decryption process, this scheme tries to minimize the time requirement for decryption. The hybridization or searching process requires less time as it does not search the whole ssDNA key, it starts searching from the position it mentioned as packet number.

IV Conclusions

In this paper, we presented an original DNA cryptography technique by using DNA digital coding technique and DNA hybridization. We use one-time-pad as the encryption key. It can be concluded from the analysis of the method that the proposed DNA cryptography method promises to be a better solution for implementation in secure network. Further, this method can be implemented in multicore environment also. However the increasing computational complexity is still an issue, which can be worked upon in future.

References

[1] L. M. Adleman, “Molecular computation of solutions to combinatorial problems”, Science, JSTOR, vol. 266, pp. 1021-1025, 1994.

[2] R. J. Lipton, “Using DNA to solve NP-complete problems”, Science, vol.268, pp. 542-545, 1995.

[3] G. Z. Cui, “New direction of Data Storage: DNA molecular Storage Technology”, Computer Engineering and Applications, vol. 42, pp. 29-32, 2006.

[4] C. T. Taylor, V. Risca, C. Bancroft, “Hiding messages in DNA microdots”, Nature, vol. 399, pp. 533-539, 1999

[5] C. Popvici, “Aspects of DNA Cryptography”, Annals of the University of Craiova, Mathematics and Computer Science Series, vol. 37(3), pp. 147-151, 2010.

[6] L. U. Mingxin, L. Xuejia, X. Guozhen, Q. Lei, “Symmetric-key cryptosystem with DNA technology”, Science in China Series F: Information Science, Springer Verlag, Germany, vol. 50, no. 3, pp. 325-333, 2007

[7] B. Anam, K. Sakib, Md. A. Hossain, K. Dalal, “Review on the advancements on DNA cryptography”, eprint arXiv:1010.0186, 10/2010.

[8] G. Z. Cui, L. Qin, Y. Wang, X. Zhang, “An Encryption Scheme using DNA Technology”, BICTA 2008.

[9] X. Guozhen, L.U. Mingxin, Q. Lei, L. Xuejia, “New field of Cryptography: DNA Cryptography”.

[10] A. Gehani, T. Labean, J. Reif, “DNA-based cryptography”, DNA based computers V. Providence: American Mathematical Society, vol. 54, pp. 233-249, 2000

554

[ieee 2012 7th international conference on electrical & computer engineering (icece) - dhaka,...

Documents