security in computing chapter 12, cryptography explained part 1

1

Security in ComputingChapter 12, Cryptography Explained

Part 1

Summary created byKirk Scott

2

• This set of overheads corresponds to the first portion of section 12.1 in the book

• The overheads for Chapter 12 roughly track the topics in the chapter

• Keep this in mind though:• On some topics I simply go over the book’s material• On other topics I expand on the book’s material in a

significant way• You are responsible not just for what’s in the book, but

also what’s in the overheads that’s not in the book

3

1. Overview

• There are several sources of tension in cryptography

• A system should be easy to use• It should be difficult to break• Cryptography is the most important security

tool• It is only of valuable within a context of other

protocols and technologies

4

• Development of cryptographic systems should be left to the experts

• It’s too difficult for amateurs• Improperly implemented, cryptography only

gives a false sense of security• Application of cryptographic systems should

be within the reach of the average user• If they’re too difficult, they will be avoided

5

2. Hard Problems

• Cryptographic systems are based on so-called hard problems

• Sound systems have their foundation in advanced math and theoretical computer science

• The goal of this chapter is to explain, generally, the mathematical approaches involved

• The mechanics of how these systems are incorporated into the Web, for example, will not be covered

6

3. Mathematics for Cryptography

• These are some of the topics covered in this part of the book

• Complexity• NP-Completeness• Examples of NP-Complete Problems• P, NP, and EXP Problems

7

• These topics relate to the question, “What is a hard problem?”

• In cryptography, “hard” has an additional characteristic

• Not only should the system be based on a hard problem

• The hard problem should not be susceptible to an easy solution.

8

4. Complexity

• The basis for this discussion comes from theoretical computer science

• What does the phrase “NP-complete” signify?• The book gives an intuitive explanation• I will just follow along• Since this isn’t a class on theory, it’s not

necessary to master these concepts in order to continue considering cryptographic systems

9

5. Three Examples of NP-Completeness

• The satisfiability problem• The knapsack problem• The clique problem• “Easy to state”• “Not hard to understand”• “Straightforward to solve” (using brute force

to check all possible solutions)• No other (apparent) solutions

10

6. Satisfiability

• Let logical expressions be created using these rules:

• They contain logical variables v1, v2, …, vn, which can take on a value of T or F

• The variables can be negated• The variables are combined into clauses using

logical OR• The clauses are combined into an expression using

logical AND

11

• The expression is satisfiable if there exists a set of T/F values for the vi that cause the expression to evaluate to T overall

• The brute force approach:• Test all different possible combinations of T/F

for the vi, checking to see whether any cause the expression to evaluate to true

12

7. Knapsack

• Let a set of non-negative integers, S = {a1, a2, …, an} be given

• Let a target value, integer T, be given• Is there a subset of S such that the sum of its

elements = T?• I.e., if vi can only be 1 or 0. is there a set of vi

such that:

13

• In other words, T is the knapsack• The ai are objects to be put into it• Is there some set of objects that will fill the

knapsack exactly?• The brute force approach:• Test all different possible combinations of 1 or

0 for the vi, checking to see whether the dot product of a and v equals T

14

8. Clique

• Let a graph G of p nodes be given• To be a graph, each node has to be connected

to at least one other node• Let n <= p be given• Is there a subset of n fully interconnected

nodes (a clique) in G?• On the following overhead, a clique of size 4,

(v1, v2, v7, v8) is shown

16

• The brute force approach:• Test all different possible subsets of G of size

n, checking to see whether any are fully interconnected

• This problem is not exactly like the other ones• This will be pursued shortly

17

9. Characteristics of NP-Complete Problems

• 1. Solvable by checking all possibilities• Either you find a solution or you find that one doesn’t

exist• 2. There are 2n possibilities• Each can be checked in some bounded time quantum,

so the time complexity overall is 2n, exponential in the size of the problem

• 3. Observe that problems come from different areas: logic, math, graph theory

• However, in a sense they are all the same

18

• 4. If you could guess perfectly, checking the proposed solution is quick

• The book makes this more specific: checking one possibility is of polynomial complexity

• On the surface the idea of guessing may seem irrelevant• It can be related to cryptography as follows:• Knowing the algorithm/key means the system can be

used in polynomial time• It’s the attacker that’s condemned to exponential time

19

10. Side Note on the Clique Problem

• It seems clear that solving satisfiability and the knapsack are 2n problems

• The clique problem is different• It depends first on how many different ways

there are to choose n distinct nodes out of p total

• It then depends on checking for every possible connection among the n

20

• Choosing n from p, “p choose n”, is the binomial coefficient:

• The number of edges in a fully interconnected graph with n nodes is n(n - 1)/2

21

• Then an expression for the complexity would be the product of these two:

• Strictly speaking, this is factorial, not exponential

• However, from a computational perspective, factorial is as impractical as exponential

22

11. A Mathematical Interlude on Binomial Coefficients and Fully Interconnected Graphs

• In order to understand the general discussion so far, it would not be necessary to go further into the math

• However, serious math will be coming, and this is a good time to start getting used to it

• Derivations of the formulas for binomial coefficients and fully interconnected graphs will be presented

23

12. Binomial Coefficients

• Consider the question of how many different ways there are of choosing k elements out of a set of n without repitition:

• There are n choices for the first, (n – 1) remaining for the second, (n – 2) for the third, down to (n – k + 1) for the kth

• The choices are independent of each other, so the total number of choices for all k is the product of these factors:

• n(n – 1)(n – 2)…(n – k + 1)

24

• There is no repetition among the individual k elements of the set

• But some element x could be chosen first, second, third, …, or last

• In other words, we are interested in the number of different arrangements of a set of k elements

• This would be k!

25

• The binomial coefficient, the number of different ways of choosing k elements from n, is the total number of ways of choosing elements without repetition, divided by the number of different possible orderings of the k:

26

• The numerator can be expressed differently by adding by changing the denominator

• Or put another way, the appearance of the formula can be changed by multiplying both the numerator and the denominator by (n – k)!

27

• Note that for n and k integer we expect the binomial coefficient to be an integer

• In other words, k!(n – k)! should go into n! evenly

• This could be proven, but even the mathematicians tend to wave their hands at this

• It may be intuitively apparent

28

• It’s also possible to give a rather circular verbal argument

• For each of the subsets of size k that we’re trying to find, n(n – 1)(n – 2)…(n – k + 2) has to include all of the arrangements of the k elements

• Therefore, it should be divisible by k!, the number of arrangements

29

• In other words, if (n choose k) is what we say it is

• And k! is what we say it is, the circular argument which shows that n(n – 1)(n – 2)…(n – k + 1) is an integer can be expressed this way:

• The product of two integers should give an integer

30

13. Fully Interconnected Networks

• You can draw a set of nodes and start drawing the connections between them and by counting, come to the following conclusion for n nodes:

• The number of connections for the first node is (n – 1), the number for the second node is (n – 2), for the third is (n – 3), and so on down, with the (n – 1)st node having 1 connection

31

• Reversing the node number/connection count allow would give the same count, only in a more convenient form:

• 1st node, 1 connection; 2nd node, 2 connections; …; (n – 1)st node, (n – 1) connections

• Then the total number of connections is:

32

• The point is to derive an algebraic summation for this expression

• This will be done by an inductive proof• The reason for doing this is to introduce you to or

refresh your memory of inductive proofs• They will come up again later when explaining

some of the math for encryption• It’s nice to get a relatively simple, understandable

example early on

33

• It is easier to deal with this expression:

• Inductive proofs are like recursion for mathematicians

• You have a base case which can clearly be shown• You also have a hypothesis about what the result

should be

34

• If you can show that assuming the hypothesis applied to the (k – 1)st case implies that the kth case works, then the hypothesis works for all cases from the base case on up

• The hypothesis will be:

• This is what we want to show

35

• For n = 1, the base case, this is easy:• n(n + 1)/2 = 1(1 + 1)/2 = 2/2 = 1• The sum of the integers from 1 to 1 is clearly 1• Now suppose that this holds for (k – 1)

36

• The sum of the first k would be the previous expression plus k:

• = (k – 1)k/2 + 2k/2• = [(k – 1)k + 2k]/2• = (k2 – k + 2k)/2• = (k2 + k)/2• = k(k + 1)/2, QED

37

• We showed that it worked for n = 1• We then showed that if it worked for k – 1, it

worked for k• This means that if it worked for 1, it works for

2• If it works for 2, it works for 3• If it works for 3, it works for 3• Ad infinitum

38

14. Where Were We?

• The interlude just showed that some of the math that we’re using can be derived

• The derivations in this case were only important as an introduction to the math we’ll be doing later on

• The topic at hand is still the question of complexity and the meaning of NP-Completeness

39

15. The Classes P, NP, and EXP

• Let P stand for the set of all problems that can be solved by an algorithm with complexity bounded by a polynomial

• Without having a supercomputer or a network of parallel, cooperating computers, polynomial algorithms tend to be at the limit of practicality for implementation

40

• Formally, NP stands for “non-deterministic polynomial”, that is, the set of problems that have algorithms with complexities with this characteristic

• It is understandable, but misleading to think of NP as meaning simply “not polynomial”

41

• An NP problem is one that would have a polynomial solution if you could guess perfectly

• You could restate the meaning of P by saying that such problems/algorithms are deterministically polynomial

• NP problems are not bounded by polynomial complexity in the same way P problems are

42

• EXP stands for the set of all problems that can be solved by an algorithm with complexity bounded by an exponential

• NP problems are not exactly the same as EXP• On the other hand, judging from the examples

given earlier, if you’re reduced to guessing, in the worst case you’ll have to guess an exponential (or factorial) number of times

43

• Practically speaking, for our purposes, an NP problem can be thought of as one with an exponential solution

• In any case, NP or EXP problems are hard problems

• They might serve as the basis for a cryptosystem

44

16. NP-Completeness

• There is one more theoretical/terminological thing to consider

• An NP-complete problem is a problem that has all of the characteristics of all other NP problems

• The NP-complete problems are those that really do break down into (exponential) guessing, where perfect guessing is polynomial

45

• It doesn’t matter what domain the problem comes from, logic, math, graph theory…

• The problems are equivalent• If one of them is NP, they all are• If by chance one could be shown to have a

polynomial solution, then they could all be solved using the same algorithm

46

• In a sense, NP problems are in a gray area• It is hypothesized that no better solutions exist

than exponential guessing• Whether or not they have an easier solution is

of interest if you build a cryptosystem on them

47

17. Karmarkar’s Algorithm

• This is just a side note• The logicians and mathematicians claim that

they have established various things• Progress in science sometimes shows previous

certainties to be false• Up until 1984, no concrete polynomial

algorithm had been found to solve linear programming problems

48

• Linear programming refers to a set of m equations in n unknowns where you would like to optimize the value obtained by picking the right values for the unknowns

• Optimization can be difficult

49

• At first glance, this looks like a problem where you would try to find an answer by checking all possibilities

• In a way, it’s worse• With numerical variables, testing all of the

possibilities doesn’t make much sense when there is no limit on the value an unknown can take on or it can take on a range of values within the reals

50

• In 1984, building on previous research in the area, Karmarkar devised an optimization algorithm that is categorized as “weakly polynomial”

• Anyone who had previously believed that optimization in linear programming was not polynomial was shown to be wrong

• Since that time the search has been on for an algorithm that is “strongly polynomial”

51

18. Other Inherently Hard Problems

• Back to reality:• In a sense, the whole discussion of NP

problems was tangential• Real systems, like RSA, have been built on

problems that don’t even rise to the NP level• They are built on problems involving modular

arithmetic and factorization of large numbers

52

• Even though existing algorithms may have complexity less than exponential, their complexity is high—and well documented

• Factoring large numbers is sufficiently difficult and time-consuming that this can be the basis for a sound cryptosystem

53

• At this time it is not practically possible to break them without large amounts of computing time

• Researchers continue to look for better algorithms

• And computing power continues to grow

54

The End

security in computing chapter 12, cryptography explained part 1

Documents