chapter 2 literature review - shodhganga : a...

CHAPTER 2

LITERATURE REVIEW

11

2. LITERATURE REVIEW

_________________________________________

2.1 INTRODUCTION

Traditionally, secrecy is required mainly in diplomatic and military

communications. Nowadays it plays [AND and SCH 2005] an

important role in our everyday lives, for example, while managing our

financial affairs. Cryptography plays a vital role in maintaining the

privacy of electronic information against threats. In doing so,

combination of symmetric cryptographic system for encryption,

decryption of the data and public key systems for managing the keys

are used [HIG 1997,SCH 1994]. Assessing the strength of the

cryptographic systems is an essential step in employing cryptography

for information security [LAM et al. 2004]. Cryptanalysis plays a key

role in this context. The main objective of cryptanalysis is not only to

recover the plain text, but also to estimate the strength of the

algorithm which is useful in designing a good cryptographic algorithm.

There is an explosive growth in unclassified research in different

aspects of cryptology and cryptanalysis. Many crypto systems that are

thought to be secure are broken and a large variety of mathematical

tools that are useful in cryptanalysis are developed. Different

approaches are available [ADA 2006, BEL 2003, CAR and MAG 2007,

12

NAW 2004] in the literature to perform cryptanalysis on either block

ciphers or stream ciphers. Classical ciphers are divided into two broad

categories: substitution ciphers and transposition ciphers.

Cryptanalysis of classical ciphers is a popular crypto logical

application for meta-heuristic search.

2.2 REVIEW ON CRYPT ANALYSIS TECHNIQUES

Substitution ciphers represent the basic building blocks of

complex and secure ciphers that are used [MAR et al. 2005, MAS

1988, MAU and WOL 2000, XUE et al. 2009] today. Understanding

the vulnerability of simple ciphers is important while building more

complex ciphers. Many cryptanalysis techniques are available [CHR

2006, DAV 2004, FAI and YOY 2006, GOL 2006, KNU 2002, KNU and

MIT 2005, LAS et al. 2005, PING et al. 2009, RAP and SID 2006, VAS

2004, VAS and GAR 2007] in the literature to break substitution

ciphers, each of them having advantages and disadvantages over one

another.

While attacking the cipher models, one can consider key recovery

attack in which the goal is to derive the secret key or decryption

attack in which the goal is to decrypt the cipher text or key recovery

from decryption attack. Different techniques were explored in the

literature to find the key of the cipher and there by decrypting the

entire cipher text. Several possible methods are available [DUN and

13

KEL 2007, LUN et al. 2008, MAS et al. 2006, OLS 2007, SKR 2007,

YAN 2008] in the literature to break a substitution cipher which

include exhaustive search, simulated annealing, frequency analysis,

genetic algorithm, particle swarm optimization, tabu search and

relaxation algorithm etc.

The exhaustive search method is the simplest out of all algorithms

used to break substitution ciphers. This technique is possible when

the cryptographic systems have finite key space and allowing for all

possible key combinations to be checked until the correct one is

found. This method is an acceptable technique for breaking a mono

alphabetic shift cipher. The first attempt using the exhaustive search

is not the best choice, since it is time consuming, but it decrypts the

text with 100% accuracy.

Michael Lucks described [LUC 1988] a method for automatic

decryption of simple substitution ciphers. This method uses

exhaustive key search and is controlled by few constraints imposed on

word patterns. This method is not used for any statistical analysis or

language heuristics. The method is language independent. This can

be applied to any language for which sufficient online dictionary is

available.

Brute force method is a way to break simple substitution ciphers,

but the number of possible keys that need to be checked is large.

14

Practically, it is impossible to do an exhaustive search with in a

reasonable amount of time. To overcome this, new algorithms are

developed for faster breaking of the cipher.

Ryabko et al. suggested [RYA et al. 2005] an attack on block

ciphers called gradient statistical attack. The possibility of applying it

to ciphers for which no attacks other than the exhaustive key search

are available is presented. The described method belongs to a chosen

plaintext attack. Analysis of statistical properties of block ciphers is

used in the process of cryptanalysis. Applicability of this method to

the RC5 cryptanalysis is experimented.

Raphael presented [RAP and SID 2006] a framework, which is

designed to describe the block cipher cryptanalysis techniques

compactly regardless of their individual differences. This framework

describes possible attacks on block ciphers and also specifies the

technical details of every attack and their respective strengths. This

framework is to describe various attacks on popular and recent block

ciphers.

Baham studied [BIH 1998] the effect of multiple modes using

chosen cipher text attack, when the underlying cryptosystems are

DES and Feal-8. It is shown that in many cases, these modes are

weaker than the corresponding multiple ECB mode. In most cases,

these modes are not much secured than one single encryption using

15

the same cryptosystem. It is suggested to use single mode and use

multiple encryptions as the underlying cryptosystems of single mode.

Automated attack algorithms are developed for which human

intervention is not necessary. These methods are finished either after

a predetermined number of iterations or after messages are

successfully decrypted. One such automated attack algorithms is the

genetic algorithm which is widely used for cracking substitution

ciphers.

Joe Gester proposed [GOE 2003] and implemented the simplest

approach of searching for a key based on the more likely used

keyword generated from the key space. The proposed Genetic

algorithm involves an iterative process of finding the fitness of the

individuals in the population. Then selectively genetic operators are

applied to the members of population to create a new generation and

the process is repeated. Each generation is created by selecting

members of the previous generation and weighted according to their

fitness.

The proposed method uses a simple genetic algorithm approach to

search the key space of cryptograms. If this method is not satisfactory,

then attempt is made to search a smaller problem space by restricting

the key searched to those which are generated by a keyword. In this

16

approach the populations reach local maxima rapidly or never seem to

converge to anything resembling English.

Like other heuristic algorithms, the genetic algorithms do not

produce the exact result. They give solutions which are nearest to the

correct one. The experiments performed using the genetic algorithm

method suggested that a fitness of about 0.9 is enough to determine

the vowel substitutions and consonant substitutions after which the

visual examination by a human can be used to decrypt the entire text.

David Ornchak proposed [ORN 2008] algorithm for decryption of

homophonic substitution ciphers. Mono-alphabetic homophonic

ciphers allow each cipher text symbols to map to only one letter of

plaintext. But homophonic substitution cipher unlike mono

alphabetic, maps each plaintext letter to one or more cipher text

symbols. Moreover homophonic ciphers conceal language statistics in

the enciphered messages and makes [GAN 1993] statistical-based

attacks more difficult one. So, an approach that uses a dictionary-

based attack using genetic algorithm is presented.

Alabassal et al. proposed [ ALA and WAH 2004] a method to

discover the key of a Feistal Cipher using Genetic Algorithm approach.

The possibility of using Genetic Algorithm in key search is attractive

due to its ability in reducing the complexity of the search problem.

The complexity of the attack is reduced by 50%.

17

Yean Li et al. performed [ LI et al. 2005] a study on the effect of an

optimization heuristic cryptanalytic attack on block ciphers. The

possible key-solution generated by the heuristic function is used to

decrypt the known-cipher text. The fitness value for the solution is

obtained by decrypting the known-cipher text and then calculating the

percentage of character-location matches in the original text and the

retrieved text. The search for the correct key combination continues

until a solution match or closest match is found within the

constraints of the test environment.

Tabu search is another optimization technique used for breaking

substitution ciphers. It is of iterative search type and is characterized

by the use of a flexible memory. It eliminates local minima and

searches beyond the local minimum. The experimental results suggest

that the genetic algorithm recovers slightly more characters than the

other algorithms. Tabu Search Algorithm and Genetic Algorithm

frame works are applied on three types of ciphers viz. AES, Hill and

Columnar Transposition. Genetic Algorithm produced results

efficiently in terms of the performance against Tabu Search. However,

according to the available literature, the Genetic Algorithm did not

perform well on the Hill Cipher and AES.

Simulated annealing is another technique that is similar to the

genetic algorithm which is used to break substitution ciphers. The

18

main difference is that the genetic algorithm has a pool of possible

keys at each moment, while the simulated annealing keeps one value

at a time. When combined with a few other simplifications, simulated

annealing makes the approach much simpler than the genetic

algorithm. Simulated annealing algorithm is much simpler to

implement than genetic algorithms and the Tabu search.

The particle swarm optimization method is another method based

on machine learning processes that is used for breaking substitution

ciphers. The algorithm starts by selecting a random population of

potential solutions, each of which is called a particle. Particle swarm

optimization is a good method for breaking simple substitution

ciphers as long as bigrams are used to calculate the fitness of

particles.

Laskari et al. [LAS et al. 2007] applied the particle swarm

optimization(PSO) method to address the problem introduced by the

cryptanalysis of block-cipher crypto systems. PSO originates from the

field of evolutionary computation. PSO is a population-based

algorithm which exploits a population of individuals, to search

promising regions of the function space. The particle swarm

optimization method is applied to the problem of locating the key of a

simplified version of DES. This method is proven to be efficient and

effective where deterministic optimization methods fail. The work

mainly involves in investigating the problem of identifying the missing

19

bits of the key used to a simplified Feistel cipher, the DES reduced to

four rounds.

Relaxation algorithm is another technique used [PEL and ROS

1979] to break substitution ciphers. This is a graph-based technique

that relies on iterative and parallel updating of values associated with

each node. The nodes of the graph are elements of the cipher

alphabet. Each node has a random variable associated with it which

represents the probabilities of the possible characters that the node

represents. The probabilities of a node are updated based on the

appearance of its two neighbors in the cipher text and the trigram

analysis of the original language.

Brute force attacks are successful in solving simple ciphers. But

cryptanalysis of complex ciphers require specialized techniques and

powerful computing systems. Nalini et al. performed [ NAL and RAO

2007] a systematic study of efficient heuristics in successful attack

on some block ciphers. For a systematic study of the attack of ciphers

using heuristics, it is desirable to have simple ciphers which are

tractable and at the same time incorporate representative features of

practical ciphers into them.

Algebraic cryptanalysis is a method, in which cryptanalysis begins

by constructing [BAR and BIH 2008, BAR 2009] a system of

polynomial equations in terms of plaintext bits, cipher text bits and

20

key bits. This technique uses [SIM 2009] modern equation solvers to

attack cryptographic algorithms. The power of the equation solver,

speed and amount of memory that is being used determines whether

the system is solvable or not.

One aspect of differential cryptanalysis that appears [SEL 2008] to

be overlooked is the use of several differences to attack a cipher

simultaneously is best described. The analysis of a cryptosystem not

only measures the strength under the best differential attack, but

need to take into account the best several attacks. These

simultaneous attacks reduce the search space by a reasonable factor.

These simultaneous attacks do result in a significant improvement.

The relative costs of encryption, XOR, and memory I/O are to be taken

into consideration [SIE et al. 2004]. Trial and error is required for an

accurate answer.

Schneier explored [SCH 2000] different cryptanalytic techniques

and the ways to break new algorithms. Breaking a cipher is not

necessarily mean finding a practical way to recover the plaintext from

the cipher text. Breaking a cipher also means, finding a weakness in

the cipher with a complexity less than the bruit force attack.

Several attempts of cryptanalysis of RC5 cipher are found [BIR and

KUS 1998, KIM et al. 2009] in the literature. Kaliski and Yin evaluated

21

[KES et al. 1996] the strength of the RC5 algorithm with respect to

linear and differential attacks. They found a linear attack on RC5 with

6 rounds that uses 257 known plaintexts and whose plaintext

requirement is impractical after 6 rounds. The best previously known

attack requires 254 chosen plain texts to derive the full set of 25 sub

keys for 12 round RC5 with 32 bit words. In this paper Alex Biryukov

et al. proposed a method that drastically improves these results due to

a novel partial differential approach. The proposed attack requires 244

chosen plaintexts only.

Blowfish is a Feistel cipher in which the round function F is a part

of the private key. A differential cryptanalysis on Blowfish is possible

either against a reduced number of rounds or with the piece of

information which describes the function F. Vaudenay showed [ VAU

2006] that the disclosure of F allows performing a differential

cryptanalysis which can recover all the rest of the key with 248 chosen

plaintexts against a number of rounds reduced to eight.

New techniques were developed for cryptanalysis based on

impossible differentials and these techniques are used for attack.

Biham et al. described [BIH 2002] the application of these techniques

to the block cipher algorithms IDEA and Khufu. The new attacks cover

more rounds than the currently well-known attacks. This

demonstrates the power of the new cryptanalytic techniques.

22

2.3 CRYPTANALYSIS USING LANGUAGE MODELS

The role of Cryptanalysis is also to study a cryptographic system

with an emphasis on exploring the weaknesses of the system. The

complex properties of natural languages play an important role in

cryptanalysis. Different approaches of cryptanalysis in the literature

use [GON 1973, STA et al. 2003, Ran and KNI 2009-2] language

characteristics to understand the strength of cipher system. One such

approach deals with frequency statistics. This is based on the

assumption that each letter in the plain text is to be substituted by

another letter of the original ciphered text.

Frequency analysis is the process of determining at what frequency

each symbol of the encrypted text occurs in the cipher text. This

information is used along with the knowledge of frequency of symbols

within the language used in the cipher to determine which cipher text

symbol maps to the corresponding plain text symbol. The frequency

analysis algorithm is the fast approach to decipher the encrypted text.

But, it requires the knowledge of the language statistics of the original

text. The disadvantage is that it relies on constant human interaction

to determine the next move in the process.

23

Rinza et al. presented [ RIN et al. 2008] a method for de-ciphering

texts in Spanish using the probability of usage of letters in the

language. This method is basically to perform Crypto-analysis of a

mono alphabetic cryptosystem. The method uses probability and

usage of letters in Spanish language to break the encrypted text files.

This method assigns weights to different alphabetical letters of

Spanish language. For this purpose analysis of the frequency of

different symbols in the Spanish plain text is done. The same analysis

is carried out on cipher text also. Every encrypted character is

mapped to a single character in the original message and vice versa.

In this way the original text is retrieved from the cipher text. Few

characters vary because of the fact that there are letters and symbols

that are having same frequency. This method of deciphering

cryptograms in Spanish to obtain the original text gave positive results

but the deciphering is not 100% successful.

A simple substitution cipher uses substitution on the set of letters

in the plain text alphabet and different letters in the cipher text

correspond to different letters in the plaintext. To encode a text by

using character wise substitution, an “infinite key” is used and each

letter in the plaintext is replaced by a letter of the cipher text by

means of a one-to-one self-mapping of the set of letters. Then, the

knowledge of the key is necessary to reconstruct the plaintext. The

work of Mineev et al. is concerned [MIN and CHU 2008] with a similar

smoothing effect on the simple substitution cipher resulting from

24

contracting the alphabet by quadratic residues and quadratic non

residues in finite fields. As a sample, the Russian alphabets are

considered in the proposed work.

A single-letter frequency analysis is helpful for obtaining an initial

key to perform powerful bi-gram analysis. Apart from single character,

relation between cipher text and plain text in terms of bi-grams and

trigrams play a vital role. Samuel W. Hasinoff presented [HAS 2003] a

system for the automatic solution of short substitution ciphers. The

proposed system operates by using n-gram model of English and

stochastic local search over all possible keys of the key space. This

method resulted in median of 94% cipher letters correctly decoded.

Sujith Ravi et al. studied [RAV and KNI 2009-1] about attacking

Japanese syllable substitution cipher. Different Natural language

processing techniques are used to attack a Japanese substitution

cipher. They made several novel improvements over previous

probabilistic methods, and report improved results.

In general the receiver uses the key to convert cipher text to plain

text. But a third party intercepts the message and guesses the original

plaintext by analyzing the repetition patterns of the cipher text. From

a natural language perspective, this cryptanalysis task can be viewed

as a kind of unsupervised tagging problem. Language Modeling (LM)

techniques are used to rank proposed decipherment. This work

25

mainly attacks on difficult cipher systems that have more characters

than English and on cipher lengths that are not solved by low-order

language models and relate the language-model perplexity to

decipherment accuracy.

Jackobsen proposed [JAC 1995] a method for cryptanalysis of

substitution ciphers. In this method the initial guess of the key is

refined through a number of iterations. In each step the recovered

plain text using the current key is evaluated to know how close it is to

the correct key. To solve the cipher using this method, bi gram

distribution of letters in cipher text and plain text are sufficient. This

method is suitable for both mono and poly alphabetic substitution

ciphers.

G W Hart proposed [HAR 1994] a method for solving cryptograms

which works well even in difficult cases where only a small sample of

text is available and the probability distribution of letters are far from

what is expected. This method performs well on longer and easier

cryptograms. An exponential time is required in the worst case, but in

practice it is quite fast. This method fails when none of the plain text

words exist in the dictionary.

A deciphering model is developed [LEE et al. 2006] by Lee et al. to

automate the cryptanalysis of mono alphabetic substitution ciphers. It

uses enhanced frequency analysis technique. The method is a three

26

level hierarchical approach. To perform deciphering of mono

alphabetic substitution cipher, monogram frequencies, keyword rules

and dictionary are to be used one by one. This approach is tested on

two short cryptograms and is observed that both cryptograms

achieved successful deciphering results in good computational time. It

is observed that the enhanced frequency analysis approach performs

faster decryption than the Hart‟s approach.

Knight et al. discussed [KNI and YAM 1999, KNI et al. 2006] a

number of natural language decipherment problems that use

unsupervised learning. These include letter substitution ciphers,

phonetic decipherment, character code conversion and word-based

ciphers with importance to machine translation. An efficient algorithm

that accomplishes a naive application of the Expectation Maximization

(EM) algorithm to break a substitution cipher is implemented.

Ravi and Knight introduced [RAV and KNI 2008] another method

that uses low-order letter n-gram model to solve substitution ciphers.

This method is based on integer programming which performs an

optimal search over the key space. This method guarantees that no

key is overlooked. This can be executed with standard integer

programming solvers. The proposed method studies the variation of

decipherment accuracy as a function of n-gram order and cipher

length.

27

Ravi and Knight created [RAN and KNI 2009-2] fifty ciphers each of

lengths 2, 4, 8, . . . , 256 . These ciphers are solved with 1-gram, 2-

gram, and 3-gram language models. The average percentage of cipher

text decoded incorrectly is recorded. It is observed that solution

obtained by integer programming is exact in achieving better

accuracy. For short cipher lengths, much higher improvement is

observed when integer programming method is used. The unigram

model works badly in this scenario, which is inline with Bauer‟s

observation for short texts. The work mainly focuses on letter

substitution ciphers which also include spaces. A comparison is made

on decipherment of ciphers with spaces and without spaces using

different n-gram English model.

2.4 INFORMATION THEORITIC APPROACH

Entropy is a statistical parameter that measures how much

information is produced on an average for each letter of a text in the

language. Redundancy measures the amount of constraint imposed

on a text in the language because of its statistical nature. Shannon

proposed a new method to estimate entropy and redundancy of a

language. This method uses the knowledge of the language statistics

possessed by the speakers of the language. It also depends on

predicting the next letter when the preceding text is known. Some

properties of an ideal predictor are developed.

28

An approach for finding n-gram entropy is developed [SHA 1951].

For this purpose a study is done on 26 letter English where spaces,

punctuation are ignored. The n-gram entropies are calculated from

letter, bigram and trigram frequencies. The estimated entropy values

for n =1,2,3 are 4.14,3.56 and 3.3 respectively. Based on the

frequencies of symbols in the reduced text, it is possible to set bounds

to the n-gram entropy of the original language.

The approach proposed [SHA 1948, SHA 1949] by Shannon deals

with the basic mathematical structure of secrecy systems. Shannon‟s

work defines theoretical secrecy which is defined as the immunity of a

system against cryptanalysis when the cryptanalyst is having

unlimited time and computing power for the analysis of cryptograms.

This is related to communication systems in which noise is present

and entropy, equivocation are applied to cryptography.

The work is also concerned with practical secrecy which is defined

as the level of security that is necessary to make the system secure

against an enemy with a limited amount of time and limited

computational resources for attacking the intercepted cryptogram.

This leads to methods for constructing systems which require a large

amount of work to solve. An analysis of the basic weakness of secrecy

systems is made. H Yamamoto presented [YAM 1991] a survey on

different information theoretic approaches in cryptology. The survey

addresses Shannon‟s cipher system, Simmons authentication

29

approach, wire tape channel, Secret sharing communication system

approaches.

Diffie and Hellman introduced [HEL 1977, DIF and HEL 1979]

another approach to achieve practical security based on

computational complexity. Trap door functions and one way functions

are used. Authenticity is introduced to prevent active attacks. Several

authentication mechanisms are developed in the literature. Simmons

proposed an authentication mechanism applicable to any type of

system. The conclusion drawn from this work is that information

theoretic approach is as important as computational complexity

approach.

Shannon in his information theoretic approach of cryptography

assumes that the computational abilities are unlimited. The work

proposed by Hellman is an extension to Shannon‟s theory. The

concept of matching a cipher to a language and the trade-off between

global and local uncertainty is developed. Hellman defined a model in

which the messages are divided into two subsets. One set is all

meaningful messages, each with the same apriori probability and

other with meaningless messages which are assigned with apriori

probabilities of zero.

Borissov and Lee computed [BOR and LEE 2007] bounds on the

theoretical measure for the strength of a system under known plain

30

text attack. Dunham proposed [DUN 1980] the key equivocation,

which is the conditional entropy of the key given cipher text and

corresponding plain text is proposed as a measure of strength of the

system. For simple substitution cipher lower and upper bounds were

found. This work resulted in concluding that the key recovery of

known-plaintext attack for substitution ciphers is more difficult when

it has many fixed points.

A study is carried out [MAU 1993, MAU 1999, RIV 1991, VER

1998, ZHA 2005] on estimating the key equivocation of secrecy

systems. In general when the length of the block is large, it is difficult

to find key equivocation. A simplified method for computing the key

equivocation for two classes of secrecy systems with Additive Like

Instantaneous Block (ALIB) ciphers is developed [ZHA 2005] by Zhang.

The criterion here is the key equivocation rather than error

probability. For simple substitution ciphers bounds are derived for the

message equivocation in terms of the key equivocation. The message

equivocation approaches faster to the key equivocation. It is also

observed that the exponential behavior of the message equivocation is

not determined by redundancy in the message source.

Maurer presents [MAU 1993] a review on the relation between

information theory and cryptography. Shannon‟s approach fixes some

lower bound on the size of the security to achieve a particular level of

security. Recent models contradict the Shannon‟s approach, where in

31

with a short key also it is possible to provide perfect secrecy. Models

like wire type and broad cost channels, privacy amplification are

considered for illustration.

A parametric evaluation and analysis of the behaviour of the

algorithms with respect to cryptographic strength is presented [PRI

and TOM 1987] by Prieto. In general Unicity distance is considered as

a parameter for evaluating the strength of a cipher. Prieto proposed

two more factors to evaluate the quality of the algorithm. One such

factor is invulnerability factor and the other is called quality factor.

According to Shannon‟s information theoretic approach, unicity

distance is a minimum length of cipher text required to determine the

key. But when cipher text length is less than unicity distance, the

predicted key has a non zero error probability for which an upper

bound is proposed [JAB 1996] by Jabri. It is observed that this

probability is inversely proportional to logarithm of key size and

directly proportional to redundancy of source.

Bauer computed [BAU 2007] unicity distance for different

substitution and transposition techniques using different n-gram

language models. The estimated values for n=1,2,3 are 167, 74 and 59

respectively. It is less uncertain with increasing length of the

cryptotext and after some length near the unicity distance the solution

becomes unproblematic, provided sufficient effort can be made.

32

Sujith Ravi and K.Knight carried out [RAN and KNI 2009-2] empirical

testing of Shannon‟s information theory for decipherment uncertainty

which includes the unicity distance. For n-gram language models with

n=1,2,3 the estimated values of unicity distance are 173,74 and 50

respectively which are similar with the results of Bauer. For real

ciphers, these unicity points do not match with the predicted

numbers. This difference is due to certain assumptions made by

Shannon in the computation of unicity distance for random ciphers.

The results confirm that the unicity distance is a function of the

language statistics used to attack the cipher. The unicity distance

becomes lower when more language statistics are incorporated into

the analysis.

Cryptanalysis is useful in finding the strength of a cryptosystem.

In a practical model, one can test the block cipher with different

known attacks and assess certain security level to it. To quantify the

security of a block cipher precisely and proving that it satisfies specific

security requirements is a difficult task. A parametric evaluation in

the form of Unicity distance is to be effectively incorporated in the

analysis, to provide information about number of spurious solutions

and a single solution to the given cipher. This helps in identifying the

strength of a crypto system.

33

2.5 PROBLEM STATEMENT

Shannon‟s model of perfect secrecy is the target for any researcher

in the present day context. The real world computational restrictions

impose restrictions on the upper bound of the above model. However,

selection of an algorithm based on the requirement of application is

the focused activity in the field of security systems. Shannon proposed

unicity distance, an ideal measure in this context. Message text and

the associated language are driving the design aspects of crypto

systems in the wake of large spread use among various linguistic

populations. In this context, language complexity versus strength of

the algorithm need to be evaluated for effective selection criteria,

before the design of a crypto system. The present work is aimed at

addressing these issues.

Various language models are proposed in a holistic manner and

most of them are closely associated with Roman script based

languages. Indic scripts posses different characteristic nature with a

primitive unit called syllable. Machine representation of syllable is

found with variable block size in these scripts. In this context it is

highly difficult to adopt block ciphers while addressing Shannon‟s

model. An algorithm is proposed for encipherment and decipherment

of Indian language based message text. The statistical behaviour of

language units and their significance in the wake of language

redundancy is important apriori knowledge while addressing the

34

decipherment problems. A complete study of above parameters on

various languages with a specific reference on Indian scripts is

addressed in the present work.

2.6 METHODOLOGY

Four languages viz. English, Telugu, Kannada, Hindi are

considered while addressing the issue of unicity distance versus

language redundancy. A large corpus size of 10,00,000 characters,

32,00,000, 17,00,000 and 9,00,000 code points is created for the

purpose of evaluation. Adequacy of the model is evaluated using

decipherment approach, Cipher text only attack is the main algorithm

adopted for the present evaluation. The conditional and unconditional

probability distributions of language units unigrams, bigrams and

trigrams is computed for building apriori knowledge. Test samples of

varying size from 6000 to 1,10,000 are used for evaluation purpose in

the decipherment approach. Retrieval efficiency is considered as a

measure equivalent to unicity distance while concluding the strength

of the algorithm based on language complexity.

2.7 ORGANISATION OF THE THESIS

In Chapter 1 issues related to cryptanalysis including the recent

trends are introduced. The limitations of key size and the necessity of

algorithm to be made public are also discussed. Information theoretic

approach for evaluation of strength of the crypto system is introduced

in this chapter.

35

Chapter 2, the current chapter elaborated the review on different

decipherment issues and their relative merits and demerits. A detailed

description on cryptanalysis using language models is presented. A

review on information theoretic approach proposed by Shannon and

reintroduction of this model in the recent period is explained in this

chapter.

Chapter 3 introduces information theoretic approach and its

applicability in decipherment process. Shannon‟s concept of ideal

secrecy system is discussed. The role of entropy and redundancy and

their impact on estimating the strength of the algorithm are explored.

The parametric evaluation using unicity distance on four different

languages viz. English, Telugu, Kannada, Hindi are presented for

varying key sizes.

Chapter 4 describes the adequacy of the proposed model. The

cryptographic model for encryption and decryption of Indic scripts is

proposed along with decipherment model. Evaluation is carried out

using unconditional and conditional probability distribution

approaches for English, Telugu, Kannada, Hindi. The text retrieval

efficiency is compared for unigram, bigram, trigram unconditional

probability distributions. Significance of conditional probability

distribution and its impact on text retrieval is emphasised. The

supportive evaluation is presented in this chapter.

36

Chapter 5 provides a detailed summary of the work with salient

features. This chapter explores open problems for future

enhancements.

chapter 2 literature review - shodhganga : a...

Documents