on the query complexity of clique size and maximum satisfiability

16
journal of computer and system sciences 53, 298313 (1996) On the Query Complexity of Clique Size and Maximum Satisfiability Richard Chang* Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County, Baltimore, Maryland 21228, USA Received January 10, 1996 This paper explores the bounded query complexity of approximating the size of the maximum clique in a graph (Clique Size) and the number of simultaneously satisfiable clauses in a 3CNF formula (MaxSat). The results in the paper show that for certain approximation factors, approximating Clique Size and MaxSat are complete for corresponding bounded query classes under metric reductions. The completeness result is important because it shows that queries and approximation are interchangeable: NP queries can be used to solve NP-approximation problems and solutions to NP-approximation problems answer queries to NP oracles. Completeness also shows the existence of approximation preserving reductions from many NP-approximation problems to approximating Clique Size and MaxSat (e.g., from approximating Chromatic Number to approximating Clique Size). Since query complexity is a quantitative complexity measure, these results also provide a framework for comparing the complexities of approximating Clique Size and approximating MaxSat. In addition, this paper examines the query complexity of the minimization version of the satisfiability problem, MinUnsat, and shows that the complexity of approximating MinUnsat is very similar to the complexity of approximating Clique Size. Since MaxSat and MinUnsat share the same solution space, the ``approximability'' of MaxSat is not due to the intrinsic complexity of satisfiability, but is an artifact of viewing the approximation version of satisfiability as a maximization problem. ] 1996 Academic Press, Inc. 1. INTRODUCTION The study of NP-approximation problems is a well- established field of theoretical computer science. In the past, this area has been explored primarily from an algorithmic point of view. The main objective of this approach is to determine the extent to which an NP-optimization problem can be approximated by an efficient algorithm. For example, an NP-optimization problem is generally con- sidered ``approximable'' if there exists a constant factor polynomial time approximation algorithm for the problem. The questions asked in this line of research tend to be qualitative in nature, hence the approximation classes such as MAXNP, MAXSNP and MAX6 1 were defined in an attempt to separate ``approximable'' problems from ``nonapproximable'' ones [PY91, PR90]. Furthermore, these classes are syntax based rather than machine based. In contrast, the traditional complexity classes, such as time and space bounded classes, are resource-bounded complexity classes which allow quantitative comparisons between problems in those classes. In their seminal paper which defined MAXNP, Papadimitriou and Yannakakis [PY91] argue against using a machine model to classify approximation problems, claiming that ``computation is an inherently unstable, non-robust mathematical object, in the sense that it can be turned from non-accepting to accepting by changes that would be insignificant in any metric.'' It is difficult to argue against such conventional wisdom. However, we would like a model which allows us to address some basic questions about the complexity of approximation problems. For example: v What computational resources are sufficient to solve the approximation problem? v Could fewer resources be used? v Is it easier to find the approximate solution than to find the exact solution? v Are coarser approximations easier to find than finer approximations? v How does the complexity of approximating different problems compare to each other? In summary, we want a computational model which allows us to quantitatively measure the complexity of approximation problems. In this paper, we argue that bounded query complexity is a natural way to measure the complexity of NP-approxima- tion problems. The bounded query class PF SAT[q( n)] is the set of functions computed by polynomial time Turing machines which ask at most q( n ) queries to the SAT oracle. Bounded query classes have been well studied, beginning with Krentel's work which used bounded query classes article no. 0070 298 0022-000096 18.00 Copyright 1996 by Academic Press, Inc. All rights of reproduction in any form reserved. * Supported in part by NSF Research Grant CCR-9309137 and the University of Maryland Institute for Advanced Computer Studies. E-mail: changcs.umbc.edu.

Upload: richard-chang

Post on 15-Jun-2016

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: On the Query Complexity of Clique Size and Maximum Satisfiability

File: 571J 142501 . By:BV . Date:26:09:96 . Time:09:37 LOP8M. V8.0. Page 01:01Codes: 6535 Signs: 4636 . Length: 60 pic 11 pts, 257 mm

Journal of Computer and System Sciences � SS1425

journal of computer and system sciences 53, 298�313 (1996)

On the Query Complexity of Clique Sizeand Maximum Satisfiability

Richard Chang*

Department of Computer Science and Electrical Engineering, University of MarylandBaltimore County, Baltimore, Maryland 21228, USA

Received January 10, 1996

This paper explores the bounded query complexity of approximatingthe size of the maximum clique in a graph (Clique Size) and the numberof simultaneously satisfiable clauses in a 3CNF formula (MaxSat). Theresults in the paper show that for certain approximation factors,approximating Clique Size and MaxSat are complete for correspondingbounded query classes under metric reductions. The completenessresult is important because it shows that queries and approximation areinterchangeable: NP queries can be used to solve NP-approximationproblems and solutions to NP-approximation problems answer queriesto NP oracles. Completeness also shows the existence of approximationpreserving reductions from many NP-approximation problems toapproximating Clique Size and MaxSat (e.g., from approximatingChromatic Number to approximating Clique Size). Since querycomplexity is a quantitative complexity measure, these results alsoprovide a framework for comparing the complexities of approximatingClique Size and approximating MaxSat. In addition, this paperexamines the query complexity of the minimization version of thesatisfiability problem, MinUnsat, and shows that the complexity ofapproximating MinUnsat is very similar to the complexity ofapproximating Clique Size. Since MaxSat and MinUnsat share thesame solution space, the ``approximability'' of MaxSat is not due to theintrinsic complexity of satisfiability, but is an artifact of viewingthe approximation version of satisfiability as a maximization problem.] 1996 Academic Press, Inc.

1. INTRODUCTION

The study of NP-approximation problems is a well-established field of theoretical computer science. In the past,this area has been explored primarily from an algorithmicpoint of view. The main objective of this approach is todetermine the extent to which an NP-optimization problemcan be approximated by an efficient algorithm. Forexample, an NP-optimization problem is generally con-sidered ``approximable'' if there exists a constant factorpolynomial time approximation algorithm for the problem.The questions asked in this line of research tend to bequalitative in nature, hence the approximation classessuch as MAXNP, MAXSNP and MAX61 were defined inan attempt to separate ``approximable'' problems from

``nonapproximable'' ones [PY91, PR90]. Furthermore,these classes are syntax based rather than machine based. Incontrast, the traditional complexity classes, such as timeand space bounded classes, are resource-boundedcomplexity classes which allow quantitative comparisonsbetween problems in those classes. In their seminal paperwhich defined MAXNP, Papadimitriou and Yannakakis[PY91] argue against using a machine model to classifyapproximation problems, claiming that ``computation is aninherently unstable, non-robust mathematical object, in thesense that it can be turned from non-accepting to acceptingby changes that would be insignificant in any metric.''

It is difficult to argue against such conventional wisdom.However, we would like a model which allows us toaddress some basic questions about the complexity ofapproximation problems. For example:

v What computational resources are sufficient to solvethe approximation problem?

v Could fewer resources be used?

v Is it easier to find the approximate solution than tofind the exact solution?

v Are coarser approximations easier to find than finerapproximations?

v How does the complexity of approximating differentproblems compare to each other?

In summary, we want a computational model which allowsus to quantitatively measure the complexity of approximationproblems.

In this paper, we argue that bounded query complexity isa natural way to measure the complexity of NP-approxima-tion problems. The bounded query class PFSAT[q(n)] is theset of functions computed by polynomial time Turingmachines which ask at most q(n) queries to the SAT oracle.Bounded query classes have been well studied, beginningwith Krentel's work which used bounded query classes

article no. 0070

2980022-0000�96 �18.00

Copyright � 1996 by Academic Press, Inc.All rights of reproduction in any form reserved.

* Supported in part by NSF Research Grant CCR-9309137 and theUniversity of Maryland Institute for Advanced Computer Studies. E-mail:chang�cs.umbc.edu.

Page 2: On the Query Complexity of Clique Size and Maximum Satisfiability

File: 571J 142502 . By:BV . Date:26:09:96 . Time:09:37 LOP8M. V8.0. Page 01:01Codes: 6369 Signs: 5550 . Length: 56 pic 0 pts, 236 mm

to classify the difficulty of NP-optimization problems[Kre88]. Subsequent studies showed that for many querybounds q(n) the class PFSAT[q(n)] is strictly larger than theclass PFSAT[q(n)&1] [ABG90, AG88, HN93] unless someintractability assumptions are violated. Thus, boundedquery complexity is a fine-grained complexity measure and,in our opinion, well suited for measuring the complexity ofapproximation problems.

In previous work, Chang, Gasarch and Lund [CG93,CGL94] gave upper bounds and relative lower bounds onthe number of queries needed to compute an approximationof the maximum clique size of a graph. They showed that fora ``nice'' approximation factor k(n)�2, WlogWlogk(n) nXX

queries to SAT can be used to find an approximation of themaximum clique size within a factor of k(n). Moreover, nopolynomial time computable function can find such anapproximation with a constant fewer queries to any oracleX unless P=NP. This constant does not depend on thefactor k(n).1 These results do depend upon breakthroughson the non-approximability of clique size and probabilisti-cally checkable proofs for NP languages [ALM+92].Chang, Gasarch and Lund also obtained similar upper andlower bounds for approximating the chromatic number of agraph and explored the query complexity of finding the sizeof the minimum set cover.

In this paper, we show a tighter connection betweenbounded queries and the complexity of NP-approximationproblems by showing that approximating Clique Sizewithin a factor of k(n) is hard for a corresponding boundedquery class under metric reductions. To do this, we firstshow that the function RMSAT

r(n) is complete for the boundedquery classes. For several approximation factors, thishardness result can be restated as completeness results aswell. In the following, let a and k be constants such thata�1 and k�1.

v (1+n&a)-approximating Clique Size is �Pmt -complete

for PFSAT[O(log n)].

v (1+1�logkn)-approximating Clique Size is �Pmt-

complete for PFSAT[(k+1) log log n+O(1)].

v k-approximating Clique Size is �Pmt-complete for

PFSAT[log log n+O(1)].

v (logkn)-approximating Clique Size is �Pmt -complete

for PFSAT[log log n-log log log n+O(1)].

These completeness results show that functions whichapproximate Clique Size are natural representatives of thefunctions in the bounded query classes. Completeness isimportant because it shows that bounded queries andapproximations are inseparable. Finding better approxima-tions produces answers to more queries to SAT-andansweringmore queries to SAT generates better approximations. The

completeness result also shows that many other NP-approximation problems can be reduced to approximatingClique Size. Moreover, these reductions are approximationpreserving in the sense that a better approximation ofClique Size will result in a better approximation of theoriginal problem.

In this paper, we also explore the query complexity ofapproximating the maximum number of simultaneouslysatisfiable formulas in a 3CNF formula (MAX3SAT). TheMAX3SAT problem is different from the Clique Sizeproblem because there exists a polynomial time algorithmwhich approximates MAX3SAT within a factor of 4�3. Dueto results by Khanna, Motwani, Sudan and Vazirani[KMSV94], we can show that approximating MAX3SATwithin a factor of k(n) is hard for the classPFSAT[log log k(n)(1+$$)], where $$ is a constant that does notdepend on k(n). Again this lower bound differs from theupper bound of log logk(n)(4�3) by only a constant numberof queries.

Finally, we address the following anomaly. Consideringthat the decision problems 3SAT and Clique are both NP-complete, why can MAX3SAT be approximated within aconstant factor and not Clique Size? We could say that3SAT is ``easier'' than Clique. Instead, we argue informallythat MAX3SAT is approximable only because it containspadding. We show that by adding a linear amount of paddingto Clique Size, the resulting approximation problem has acomplexity that is very similar to MAX3SAT. Our intuitiveinterpretation of this observation is that MAX3SAT Con-tains a lot of ``fluff '' that is approximable in polynomialtime. We then argue that the ``non-approximable'' part ofMAX3SAT requires just as many queries to approximate asClique Size does. Hence, in its core, MAX3SAT is just asdifficult as Clique Size. Note that such comparisons canonly be made after we have established a uniform quan-titative complexity measure to gauge the difficulty ofapproximating Clique Size and MAX3SAT. Further-more, we show that by viewing the approximation versionof 3SAT as a minimization problem, which we callMIN3UNSAT, the complexity of the resulting approxima-tion problem is very similar to that of approximating CliqueSize. Therefore, we conclude that the ``approximability'' ofMAX3SAT is not due to 3SAT being easier than Clique, butis an artifact of treating the approximation version of 3SATas a maximization problem.

The rest of this paper is organized as follows. Section 2explains the terminology and notation used in the paper.Section 3 shows the hardness and completeness ofapproximating Clique Size. In Section 4, we construct anapproximation preserving reduction from ChromaticNumber to Clique Size. Section 5 establishes the querycomplexity of the MAX3SAT problem. Section 6 showsa connection between the structure of bounded queryclasses and the self-improvability of Clique Size. Section 7

299CLIQUE SIZE AND MAXIMUM SATISFIABILITY

1 Under the assumption that RP{NP, this constant is 6.

Page 3: On the Query Complexity of Clique Size and Maximum Satisfiability

File: 571J 142503 . By:BV . Date:26:09:96 . Time:09:37 LOP8M. V8.0. Page 01:01Codes: 6088 Signs: 4899 . Length: 56 pic 0 pts, 236 mm

compares the complexities of approximating Clique Sizeand MAX3SAT. Section 8 shows that by viewing theapproximation version of 3SAT as a minimization probleminstead of a maximization problem, the resulting NP-approximation problem has the same complexity asapproximating Clique Size. Finally in Section 9, we providesome updated references to results from more recent works.

2 PRELIMINARIES

We are interested in two NP-optimization problems:Clique and MAX3SAT. The Clique problem is identifyingthe vertices of a maximum clique in a graph and theMAX3SAT problem is finding an assignment to thevariables in a 3CNF formula which satisfies the largestnumber of clauses. We will work with a variation of Cliqueand MAX3SAT, Clique Size and MaxSat, which merelyasks for the size of the optimum solution. We now defineprecisely what it means for a function to approximateClique Size and MaxSat.

Definition 1. Given a graph G with n vertices, we use|G| to denote the number of vertices in G (rather than thelength of the encoding of G) and we write |(G) for the sizeof the maximum clique in the graph. We say that anumber x is an approximation of |(G) within a factor ofk(n) if x�|(G)<k(n) } x. To shorten the terminology, wewill also say that the number x k(n)-approximates |(G).Let f be a function such that for all graphs G,f (G) k( |G| )-approximates |(G). Then we say that thefunction f k( } )-approximates Clique Size.

Definition 2. Let F be a 3CNF formula with nclauses. We use |F | to denote the number of clauses inthe formula F (rather than the length of encoding of F ),and we define MaxSat(F ) to be the maximum number ofsimultaneously satisfiable clauses in F. Then, a number xis an approximation of MaxSat(F ) within a factor of k(n)if x�MaxSat(F )<k(n) } x. In this case, we also say thatthe number x k(n)-approximates MaxSat(F ). A function fk( } )-approximates MaxSat if for all formulas F, f (F )k( |F | )-approximates MaxSat.

Traditionally, approximation factors for maximumclause satisfiability are less than 1. E.g., Yannakakis'sapproximation algorithm for MAX3SAT [Yan92] isusually cited as a 3�4 approximation rather than a 4�3approximation. In this paper, we use consistent terminologyand always state approximation factors that are greaterthan 1.

Since approximation problems are solved by functionsrather than languages, we have to use metric reductionsbetween functions instead of marty-one reductions. Thefollowing definition of metric reduction is due to Krentel[Kre88]:

Definition 3. Let f and g be two functions. A metricreduction from f to g, written f �P

mt g, is a pair of polynomialtime computable functions T1 and T2 such that for all x,

f (x)=T2(x, g(T1(x))).

Intuitively, T1 transforms the input x into the domain of thefunction g, then T2 takes the answer given by g(T1(x)) andcomputes the value of f (x). So, T1 and T2 allow you to solvef (x) if you have a way of computing g.

The complexity measure we use in this paper is thenumber of oracle queries that a polynomial time Turingmachine needs to compute a function.

Definition 4. Let PFX[q(n)] be the class of functionscomputed by polynomial time oracle Turing machineswhich ask at most q(n) queries to the oracle X. Since thequeries are adaptive, the query strings may depend onanswers to previous queries.

Note that for constant k, PFX[k] is closed under metricreductions, but for growing functions q(n), PFX[q(n)] mightnot be closed under metric reductions. For example,suppose that f�P

mt g and g # PFSAT[q(n)]. Then, it is possiblethat f � PFSAT[q(n)]), because the reduction from f to gcan stretch the input to g by a polynomial factor. Hence,computing f (x) may require q(nO(1)) queries to SAT. As aresult, our completeness results are stated for classes likePFSAT[loglogn+O(1)] which are closed under metric reductions.We will return to this topic in Section 6 where we observea connection between closure under metric reductions andthe self-improvability of NP-approximation problems.

In the next section, we will prove that some problemsinvolving sequences of Boolean formulas are �P

mt-completefor corresponding bounded query classes.

Definition 5. For a sequence F1 , ..., Fr of Booleanformulas, we define >SAT

r (F1 , ..., Fr) to be the number offormulas in [F1 , ..., Fr] which are satisfiable andRMSAT(F1 , ..., Fr) to be the index of the rightmostsatisfiable formula. That is,

>SATr (F1 , ..., Fr)=|[F1 , ..., Fr] & SAT|,

RMSATr (F1 , ..., Fr)= max

1�i�r[i | Fi # SAT].

If none of F1 , ..., Fr are satisfiable, then we letRMSAT

r (F1 , ..., Fr)=0.

In this paper, we will also need to consider sets of sequencesof Boolean formulas with growing ``arity''. Without loss ofgenerality, we can assume that all of the formulas in asequence have the same number of clauses n and we will letthe number of formulas r(n) vary as a function of n. Thisnotational device makes it easier to discuss the complexity

300 RICHARD CHANG

Page 4: On the Query Complexity of Clique Size and Maximum Satisfiability

File: 571J 142504 . By:BV . Date:26:09:96 . Time:09:37 LOP8M. V8.0. Page 01:01Codes: 6903 Signs: 5662 . Length: 56 pic 0 pts, 236 mm

of functions like >SATlog n . In this case, the function expects to

see inputs of the form F1 , ..., Fr where r�log |Fi |. Strictlyspeaking, the size of F1 , ..., Fr(n) is n } r(n). However, thequery complexity of >SAT

r(n) and RMSATr(n) depends on r(n), the

arity of the tuple, rather than the length of the tuple. Thus,we count the queries as a function of n or r(n) rather thanthe length of the encoding of F1 , ..., Fr(n) . For running times,however, are still expressed in terms of the length of theinput. We need this convention to state the upper and lowerbounds on the number of queries needed to computeRMSAT

r(n) .For polynomial time computable r(n), both RMSAT

r(n) and>SAT

r(n) can be computed using Wlog(r(n)+1)X queries to SATby binary search. In previous work, Chang, Gasarchand Lund [CGL94] showed a relative lower bound on thenumber of queries needed to solve a certain ``promiseproblem'' using standard tree pruning techniques [ABG90,HN93]. This result provides the following corollary on thenumber of queries needed to compute RMrSAT

r(n) .

Corollary 6. Let r(n) be a logarithmically boundedpolynomial time computable function. If there exists an oracleX such that some polynomial time function computes RM SAT

r(n)

using fewer than Wlog(r(n)+1)X queries to X, then P=NP.

In this paper, we need lower bounds on RMSATr(n) for cases

where r(n) # nO(1). We could obtain some lower bounds bytranslating the terseness results by Amir, Beigel andGasarch [ABG90] and Beigel [Bei88]. However, the lowerbounds would not be optimal, because there would be somedifficulty translating the terminology between the papersregarding the ``size'' of the input.2 Thus, we prove thefollowing lemma using the tree-pruning techniques fromAmir, Beigel and Gasarch.

Theorem 7. Let r(n) # nO(1) be a polynomial timecomputable function. If there exists an oracle X such thatsome polynomial time function computes RM SAT

r(n) using fewerthan Wlog(r(n)+1)X queries to X, then P=NP.

Proof. This is a straightforward tree-pruning argument.Let M be a polynomial time Turing machine that computesRMSAT

r(n) using fewer than Wlog(r(n)+1)X queries to someoracle X. Given an input sequence F1 , ..., Fr(n) to M, we cansearch the entire oracle query tree in polynomial time andenumerate a list of outputs from M for each sequenceof oracle replies. This list will have fewer than r(n)+1

values, one of which is RMSATr(n) (F1 , ..., Fr(n)). Since

RMSATr(n) (F1 , ..., Fr(n)) can range in value from 0 to r(n), one

of the possible values, call it z, is not in the list. We knowthat z{RMSAT

r(n) (F1 , ..., Fr(n)) and we found z in deter-ministic polynomial time without using any oracle queries.

Now, given a Boolean formula F with n clauses, we willprune the disjunctive self-reduction tree of F to try to find asatisfying assignment for F. We expand the root of the treeuntil we have at least r(n) nodes in the deepest level. LetF1 , ..., Fr(n) be the first r(n) nodes in the deepest level. Usethe algorithm outlined above to find an index z such thatz{RMSAT

r(n) (F1 , ..., Fr(n)). If z=0, then we know that at leastone of F1 , ..., Fr(n) is satisfiable which implies that F # SAT.So, we can accept F outright. If z{0, then we remove thenode Fz from the tree. We can safely remove Fz because weknow that Fz is not the rightmost satisfiable formula. So, ifF1 , ..., Fr(n) contains satisfiable formulas, then at least oneremains after removing Fz . We repeat this process untilthere are only r(n)&1 nodes in the deepest level. Then, weexpand the tree and repeat the pruning process again. Whenthe leaves of tree are reached, there are only polynomiallymany leaves left in the tree. We can check each one andaccept if and only if a satisfying assignment is found. Thus,SAT # P and P=NP. K

In many parts of the paper, using nested sequences offormulas will greatly simplify our calculations. Thecorollary below adapts Theorem 7 for nested sequences.

Definition 8. We say that the sequence F1 , ..., Fr ofBoolean formulas is a nested sequence if for all i, 1�i<r,Fi+1 # SAT O Fi # SAT. For a nested sequence F1 , ..., Fr ,

>SATr (F1 , ..., Fr)=RMSAT(F1 , ..., Fr).

Corollary 9. Let r(n) # nO(1) be a polynomial timecomputable function and let X be any oracle. If there exists apolynomial time Turing machine which can determine thevalue of >SAT

r(n) (F1 , ..., Fr(n)) using fewer than Wlog(r(n)+1)X

queries to X for every nested sequence F1 , ..., Fr(n) thenP=NP.

Proof. We will show that under the hypothesis we canalso compute RMSAT

r$(n) using fewer than Wlog(r$(n)+1)X

queries to X for some r$(n) # nO(1). Hence, P=NP byTheorem 7. Now, let F $1 , ..., F $r$(m) be a sequence of Booleanformulas, not necessarily nested, such that |F $i |=m. Foreach F $i , we construct a formula Fi with n clauses such that

Fi # SAT � _ j�i, F $j # SAT.

We can simply let Fi be the disjunction of F $1 , ..., F $i . (If it isnecessary for Fi to be a formula in conjunctive normal form,we can resort to Karp's reduction.) By padding, we canassume without loss of generality that every Fi has exactly

301CLIQUE SIZE AND MAXIMUM SATISFIABILITY

2 Previous versions of this paper [Cha94a, Cha95] stated that Corollary6 can be used to obtain terseness results. While this is true, the statementof Corollary 9 in those versions is incorrect. In particular, the tersenessresults we could obtain would not supersede the theorems shown by Beigel[Bei88], Amir�Beigel�Gasarch [ABG90] or Krentel [Kre88]. Thedifficulty lies in the fact that in this paper we do not count queries basedon the length of the input. Even though we can show that PFSAT[q(n)] cansolve RMSAT

2q(n)&1 but PFSAT[q(n)&1] cannot (unless P=NP), the n here is notthe length of the input.

Page 5: On the Query Complexity of Clique Size and Maximum Satisfiability

File: 571J 142505 . By:XX . Date:17:09:96 . Time:14:16 LOP8M. V8.0. Page 01:01Codes: 6221 Signs: 4917 . Length: 56 pic 0 pts, 236 mm

n= p(m) clauses for some polynomially bounded p( ). Bydefining r$(m)=r( p(m))=r(n), it is syntactically correct tosay that

RM SATr$(m)(F $1 , ..., F $r$(m))=>SAT

r(n) (F1 , ..., Fr(n)).

(Recall that the domain of the function RMSATr$(m) is the set of

sequences F $1 , ..., F $l such that l�r$( |F $i | ).) Then, byhypothesis, >SAT

r(n) (F1 , ..., Fr(n)) can be computed using fewerthan Wlog(r(n)+1)X queries to X. In terms of m, the samemachine can be used to compute RMSAT

r$(m)(F $1 , ..., F $r$(m))using fewer than Wlog(r$(m)+1)X to X. By Theorem 7, thisimplies that P=NP. K

3. HARDNESS AND COMPLETENESS

We start this section by showing that every function inPFSAT[q(n)] can be �P

mt -reduced to RMSAT2q(n)&1 , when q(n) is

bounded above by O(log n). Using this reduction, we showthat any function which k( } )-approximates Clique Size is�P

mt -hard for a corresponding bounded query class. Then,combined with the upper bounds on the complexity ofapproximating Clique Size, we translate the hardness resultsinto completeness results for certain approximation factors.

Theorem 10. Let q(n) # O(log n) be a nondecreasingpolynomial time computable function and let r(n)=2q(n)&1.The function RM SAT

r(n) is complete for PF SAT[q(n)] undermetric reductions. In particular, there exist polynomial timecomputable functions T1 and T2 such that for all functionsf # PF SAT[q(n)] and for all x

T1(x)=(F1 , ..., Fr( |x| ))

f (x)=T2(x, RMSATr(n) (F1 , ..., Fr( |x| ))).

Proof. First, we need to show that RMSATr(n) can be solved

by a function in PFSAT[q(n)]. Without loss of generality, wecan assume that q(n) is a whole number since a PFSAT[q(n)]

machine will make at most wq(n)x queries. Using binarysearch and the SAT oracle, we can determine the rightmostformula in a sequence F1 , ..., Fr(n) that is satisfiable. Notethat the answer will range from 0 to r(n), so there arer(n)+1 possible answers. Thus, binary search would uselog(r(n)+1)=q(n) queries to SAT.

To show that every function in PFSAT[q(n)] can bereduced to RMSAT

r(n) , look at the query tree with q(n) serialqueries. Each node of this tree represents a query to SAT.The left subtree of a node represents the computation of thePFSAT[q(n)] machine if the answer to the query is ``no''. Theright subtree corresponds to a ``yes'' answer to the query.Each path in the query tree (from the root to a leaf)represents a full computation of the machine askingq(n) queries to some oracle. The goal of the reduction is to

determine which path is taken when the oracle is SAT. Wecall this path the true path.

Given any path in the query tree, we say that a query onthe path is positive if the path goes through the right subtreeof the query node. Otherwise, the query is negative. Forexample in Fig. 1, on path p5 , queries q1 and q6 are positivequeries and q3 is a negative query. Note that being positiveor negative is relative to the path. Now, for each of the r(n)paths pi , ..., pr(n) , we construct a Boolean formula. For pathpi , the corresponding formula Fi is the conjunction of all thepositive queries on the path. In our example, F5=q1 7 q6

and F3=q2 7 q5 . We ignore the path p0 because there areno positive queries on this path.

Let pz be the true path in the query tree. We make twoobservations. First, Fz # SAT, since all the positive queriesin path pz are satisfiable formulas. Second, for all i>z, Fi �SAT. To see this, note that i>z implies that the path pi isto the right of the path pz . Hence some positive query qt onpath pi must be a negative query on path pz . Since pz is thetrue path, qt must be unsatisfiable. Hence, Fi is unsatisfiable.On the other hand, if i<z, then Fi may be satisfiable orunsatisfiable. For example, if p5 is the true path in Fig. 1,then F2 is satisfiable iff q2 is satisfiable, which is independentof the queries on the true path. The result is that we haveproduced a sequence F1 , ..., Fr(n) such that RMSAT

r(n) (F1 , ..., Fr(n))is exactly the index of the true path.

We may assume that all of the Fi have the same numberof clauses m. Since m�n without loss of generality, usingRMSAT

r(n) (F1 , ..., Fr(n)) is an abuse of notation (it should beRMSAT

r(m)(F1 , ..., Fr(m))). However, r( } ) is a non-decreasingfunction, so r(n)�r(m). Thus, F1 , ..., Fr(n) is a properlyformatted input to the function RMSAT

r(m) .Finally, let f be a function in PFSAT[q(n)]. We define a

metric reduction to reduce the function f and an input stringx to the function RMSAT

r(n) as follows. We let T1 be thefunction which prints out the sequence F1 , ..., Fr(n) asdescribed above. Now, if we are given RMSAT

r(n) (F1 , ..., Fr(n)),then we have the index of the true path pz . Thus, we cancompute the value f (x), by simulating the computation off (x) along the true path. This simulation does not requireany oracle-queries. K

The metric reduction in Theorem 10 is query preservingin the following sense. The transducer T1 stretches the size

FIG. 1. A query tree for 3 queries.

302 RICHARD CHANG

Page 6: On the Query Complexity of Clique Size and Maximum Satisfiability

File: 571J 142506 . By:BV . Date:26:09:96 . Time:09:37 LOP8M. V8.0. Page 01:01Codes: 6233 Signs: 4865 . Length: 56 pic 0 pts, 236 mm

of the input x; however, RMSAT2q(n)&1(F1 , ..., F2q(n)&1 can

always be solved with q(n) serial queries to SAT. So, evenwhen f is a function that requires q(n) serial queries, theoutput of T1 can be solved with q(n) serial queries.

As in Corollary 9, since we can convert a sequenceF1 , ..., Fr(n) into a nested sequence F $1 , ..., F $r(n) where

RMSATr(n) (F1 , ..., Fr(n))=>SAT

r(n) (F $1 , ..., F $r(n)).

Theorem 10 also shows that >SATr(n) is complete.

Corollary 11. Let q(n) # O(log n) be nondecreasingand polynomial time computable. Then >SAT

r(n) is complete forPF SAT[q(n)] where r(n)=2q(n)&1.

The following corollary is a special case of Corollary 11when r(n) is a constant k. This corollary also shows an inter-esting connection between bounded query function classesand bounded query languages classes, because the canonical�P

m -complete language for the language class PSAT[k] isthe language ODDSAT

2k&1 �EVENSAT2k&1 [Bei91, CGH+88].

Here, ODDSATr (and respectively EVENSAT

r ) is the set ofnested sequences F1 , ..., Fr such that >SAT

r (F1 , ..., Fr) is odd(even).

Corollary 12. For all constant k, the function >SAT2k&1 is

�Pmt -complete for PF SAT[k].

We are now ready to show the completeness ofapproximating Clique Size. The following upper bound isfrom Chang, Gasarch and Lund [CG93, CGL94]:

Lemma 13. Let 6 be an NP-optimization problem andlet OPT6 (x) be the cost of the optimum solution for aninstance x. Suppose that the best polynomial time approxima-tion algorithm can approximate OPT6 within a factorof A(n). Then, for k(n)�A(n), there exists a polynomialtime algorithm which k( } )-approximates OPT6 usingWlog Wlogk(n) A(n)XX queries to SAT.

Proof. First, assume that 6 is a maximization problem.Given an instance x of 6, we first use the polynomial timealgorithm to approximate OPT6 (x) within a factor of A(n),where n=|x|. This algorithm will produce a number z andwe know that z�OPT6 (x)<A(n) } z. Now, we divide thenumber line between z and A(n) } z into intervals of the form[zk(n)i, zk(n) i+1). There are at most logk(n) A(n) such inter-vals. Since 6 is an NP-optimization problem, a query toSAT can determine whether OPT6 (x)� y for some numbery. Thus, using binary search Wlog Wlogk(n) A(n)XX queriesare sufficient to find an interval that contains OPT6 (x). Wecan then report the lower endpoint of the interval as a k(n)-approximation of OPT6 (x). Minimization problems arehandled analogously. K

For Clique Size, we know that the |(G) ranges from 2 ton for a graph of size n (by first checking if the graph has anyedges). Thus we have the following corollary.

Corollary 14. Let k(n) be a polynomial timecomputable function such that 1<k(n)<n. Then thereexists a function in PFSATWlog Wlogk(n) nXX] which k( } )-approximates Clique Size.

The upper bound on the complexity of approximatingClique Size is achieved by binary search. The difficult partis to show that every function computable using a constantfewer queries �P

mt-reduces to k( } )-approximating CliqueSize. To do so, we need to review some consequences of theexistence of probabilistically checkable proofs for NP.

Lemma 15. [ALM+92]. There exist constants s, b andd, 1<s<b<d, such that given a Boolean formula F with tclauses, we can construct in polynomial time a graph G withtd vertices, where

F # SAT O |(G)=tb

F � SAT O |(G)=ts

The lemma as cited only shows that |(G)<ts in the casethat F � SAT. A simple padding argument will give equalityin this case. We make this modification because it willgreatly simplify our calculations.

Lemma 15 can be rephrased as a statement on theapproximability of |(G). Let ==(b&s)�d. Then the ratioof the size of the big clique, tb, to the size of the smallclique, ts, is (td)=. Since |G|=td, Lemma 15 shows that ann=

-approximation of |(G) will determine whether theoriginal formula F is satisfiable. Throughout the rest of thispaper, we will fix the values of =, b, s and d to be theseconstants.

We would like to state our theorem about the hardness ofapproximating |(G) in the most general terms possible.Since the number of queries required to solve theapproximation problem increases as the approximationfactor becomes finer (smaller), the statement of our resultsdepend on the approximation factor. In general, this factordoes not have to be a constant or even an increassng function.Our theorem is restricted to the classes PFSAT[q(n)] whereq(n) is non-decreasing.

Theorem 16. (Main Theorem). Let c=1+log(1+1�=)and let q(n) # O(log n) be a non-decreasing polynomial timecomputable function. Define the function k(n) such thatq(n)=log logk(n) n&c (i.e., k(n)=n2&q(n)&c

). If h is a functionwhich k( } )-approximates Clique Size and f # PFSAT[q(n)],then f �P

mth.

The proof of our main theorem modifies the constructionof Chang, Gasarch and Lund [CGL94]. However, in thisproof, we are able to remove several restrictions on the

303CLIQUE SIZE AND MAXIMUM SATISFIABILITY

Page 7: On the Query Complexity of Clique Size and Maximum Satisfiability

File: 571J 142507 . By:BV . Date:26:09:96 . Time:09:37 LOP8M. V8.0. Page 01:01Codes: 5331 Signs: 3026 . Length: 56 pic 0 pts, 236 mm

approximation factor k(n). In particular, we can handlecases where k(n) is a fractional value between 1 and 2. Also,the calculations are simplified in this proof. Our mainconstruction is encapsulated in the following lemma whichwill also be used to prove Theorem 18. This lemmaconstructs a graph H with m vertices such that a k(m)-approximation of |(G) will tell us the number of satisfiableformulas in a nested sequence F1 , ..., Fr(t) . In what follows,the constants b, s, d and ==(b&s)�d are from Lemma 15.

Lemma 17. (Construction Lemma). Let k( } ) be apolynomial time computable function such that k(n)�1+- 2�n$ for $==�(4+4=). Let F1 , ..., Fr(t) be a nestedsequence with |Fi |=t, m=tb&s+d and

r(t)�wlogk(m) t(b&s)�2x.

Then, in polynomial time we can construct a graph H with mvertices and whole numbers y0< } } } < yr(t) such that k(m) yi

< yi+1 for 0�i�r(t)&1, and

|(H)= yz � >SATr(t) (F1 , ..., Fr(t))=z.

Proof. We will first construct a graph H$ with fewerthan m vertices, then the graph H is produced from H$ bysimply adding m&|H$| unconnected vertices. To constructH$, we take each Fi and produce a graph Gi with td verticesaccording to Lemma 15. We want to define two operationson graphs: addition � and scalar multiplication � . Theseoperations commute with |( ):

|(G1 �G2)=|(G1)+|(G2)

|(a�G)=a } |(G).

The vertices of the graph G$=G1 �G2 is the disjoint unionof the vertices of G1 and G2 . The edges in G$ are all the edgesin G1 and G2 plus the edges (u, v) for each u # G1 and v # G2 .Then, scalar multiplication for a whole number a is simplyrepeated addition.

To simplify our notation, let g=k(m) and r=r(t). Notethat g does not have to be a whole number. We define asequence of whole numbers a1 , ..., ar inductively:

ai={1Wgai&1 X

if i=1for 2�i�r

We define the graph H$ to be (a1 �G1)� (a2 �G2)� } } } � (ar �Gr). Clearly, H$ has �r

i=1 ai td vertices. Toestimate the value of �r

i=1 ai , note that for i�2, we haveai< g } ai&1+1, so

ai� gi&1+ gi&2+ } } } +1=gi&1g&1

. (1)

Thus, we have the estimate

:r

i=1

ai� :r

i=1

gi&1g&1

<g

g&1:

r&1

i=0

gi=g(gr&1)(g&1)2 . (2)

By the restriction on r, we know that

g(gr&1)(g&1)2 <

g(g&1)2 } t (b&s)�2.

If g�2, we know that g�(g&1)2�2. If g�2, then from thelower bound on k(m) and the fact that tb&s=m=�(1+=) wecan deduce that

g(g&1)2�t(b&s)�2.

In either case, H$ has at most tdtb&s vertices. Sincem=tb&s+d, we also have |H$|<m. Then, we can constructH by simply adding dummy vertices to H$. The result isthat |H|=m and |(H)=�r

i=1ai|(Gi). Also, when z=>SAT

r (F1 , ..., Fr), by Lemma 15 we have

|(H)= :z

j=1

ajtb+ :r

j=z+1

aj ts.

So, we can define the numbers y0 , ..., yr as:

yi= :i

j=1

aj tb+ :r

j=i+1

aj ts.

It is easy to see that |(H) # [ y0 , ..., yr] and thatz=>SAT

r (F1 , ..., Fr) O |(H)=yz . We still need to showthat yi+1>gyi . From the definition of yi and the fact thatgai�ai+1 , it follows that

yi+1& gyi�tb& garts.

By Equation 1, it suffices to show that

g(gr&1)g&1

<tb&s. (3)

From the restriction on r, we know that

g(gr&1)g&1

<g

g&1t(b&s)�2.

As before, when g�2, we have g�(g&1)�2. When g�2,the lower bound on k(m) provides us with the bound:

gg&1

�- 2 } t(b&s)�4.

304 RICHARD CHANG

Page 8: On the Query Complexity of Clique Size and Maximum Satisfiability

File: 571J 142508 . By:BV . Date:26:09:96 . Time:09:37 LOP8M. V8.0. Page 01:01Codes: 5650 Signs: 3694 . Length: 56 pic 0 pts, 236 mm

In either case, we have satisfied the inequality in Equation3 for large enough t. Since, g=k(m), we can finally concludethat k(m) } yi< yi+1 . Thus, y0<y1< } } } < yr and

|(H)=yz � >SATr (F1 , ..., Fr)=z. K

Proof of Main Theorem. Let k(n) be as defined in thehypothesis. First, we will assume that k(n)�1+- 2�n$ forsome $<=�(4+4=). Let f be any function in PF SAT[q(n)].Since q(n) # O(log n), we can use Theorem 10 to reduce f toRM SAT

r(n) via T1 and T2 where r(n)=2wq(n)x&1. Thus, given aninput x to the function f, with |x|=n, we can produceF $1 , ..., F $r(n) such that

f (x)=T2(x, RMSATr(n) (F $1 , ..., F $r(n))).

In addition, we transform F $1 , ..., F $r(n) into a nestedsequence F1 , ..., Fr(n) as described in Corollary 9. Withoutloss of generality, assume that every Fi has the same lengtht which is greater than n. The function k(n) was defined sothat log logk(n) n&c=q(n). Using the definition of r(n), wecan conclude that

log(r(n)+1)=wq(n)x�q(n)=log logk(n) n&c.

Thus,

r(n)+1<2&c } logk(n) n.

Now, q(n) is nondecreasing and c is a constant, so thefunction logk(n) n must also be nondecreasing. Let m=tb&s+d. Then, n<t<m, so

r(n)+1<2&c } logk(n) n�2&c } logk(m) m.

From the definition of c, we obtain:

2&c==

2(1+=)=

b&s2(b&s+d )

.

Thus, we can rewrite 2&c logk(m) m as

2&c logk(m) m=b&s

2(b&s+d )logk(m) tb&s+d

=logk(m) t(b&s)�2.

This provides the desired restriction on r, since

r(n)<logk(m) t(b&s)�2&1<wlogk(m) t(b&s)�2x

which satisfies the hypothesis of the Construction Lemma.Thus, we obtain a graph H and numbers y1 , ..., yr(n) suchthat

|(H)= yz � >SATr(n) (F1 , ..., Fr(n))=z.

Since F1 , ..., Fr(n) is nested, we also know thatRM SAT

r(n) (F $1 , ..., F $r(n))=>SATr(n) (F1 , ..., Fr(n)). The Construction

Lemma also guarantees that the yi 's are separated by afactor of k(m), where m=|H|. So, if a function h k( } )-approximates |( } ), then given y=h(H), we can find theunique z such that yz� y<k(m) } yz . Then, f (x)=T2(x, z)and we have reduced f to h by a metric reduction.

Now, if k(n)<1+- 2�n$, then we cannot use theConstruction Lemma directly. Instead, let k$(n)=1+- 2�n$ and q$(n)=log logk(n) n&c. Then, we can reduceany function f $ in PF SAT[q$(n)] to k$( } )-approximating |( } ).Since k(n)<k$(n), f $ also reduces to k( } )-approximating|$( } ). Now, q$(n) is 3(log n), so every function inPFSAT[O log(n)] reduces to some f $ # PFSAT[q$(n)] by a simplepadding argument. Thus, every function in PFSAT[q(n)]

reduces to k( } )-approximating |( } ).To see that q$(n) is indeed 3(log n), note that the Taylor

series for ln(1+1�N) allows us to approximate ln(1+1�N)as 1�N for large values of N. Thus,

q$(n)=log ln n&log ln(1+1�n$)&c

rlog ln n+$ log n&c.

For large n, the lower order terms of the Taylor series wouldnot contribute significantly, so the $ log n term dominatesthe value of q$(n). So, q$(n) is indeed 3(log n). K

Remark. If we are willing to settle for a randomizedreduction with exponentially small error, then combiningthe techniques of Bellare, Goldwasser, Lund and Russell[BGLR93] with those of Zuckerman [Zuc93] as describedby Chang, Gasarch and Lund [CGL94, Section 6], we have==1�31 and c=1+log(1+31)=6.

Using the same Construction Lemma, we can obtain arelative lower bound on the number of queries needed tocompute a k(n)-approximation of |(G). The improvementover the results in Chang, Gasarch and Lund [CGL94] istwofold. First, we no longer need to make any ``niceness''assumptions on k(n). Also, the proof works for k(n)<2.

Theorem 18. Let k(n) be polynomial time computablesuch that 1+- 2�n$�k(n)�n, for some $<=�(4+4=). Ifthere exists a polynomial time computable function whichk( } )-approximates |(G) using log logk(n) n&c queries to anyoracle X for all graphs G with n vertices and c=1+log(1+1�=), then P=NP.

305CLIQUE SIZE AND MAXIMUM SATISFIABILITY

Page 9: On the Query Complexity of Clique Size and Maximum Satisfiability

File: 571J 142509 . By:BV . Date:26:09:96 . Time:09:37 LOP8M. V8.0. Page 01:01Codes: 6135 Signs: 4460 . Length: 56 pic 0 pts, 236 mm

Proof. Suppose there exists a function h which k(n)-approximates |(G) using no more than log logk(n) n&cqueries to X. We will show that for some r(t) # nO(1), >SAT

r(t)

for nested sequences can be computed using fewer thanWlog(r(t)+1)X queries to X. This in turn implies thatP=NP by Corollary 9.

As required by the Construction Lemma, let m=tb&s+d

and

r(t)=wlogk(m) t(b&s)�2x.

Let F1 , ..., Fr(t) be a nested sequence, where |Fi |=t. Webuild the graph H with |H|=m according to the Construc-tion Lemma. We then compute a number y which is ak(m)-approximation of |(H) using the function hand log logk(m) m&c queries to X. As before, the unique zsuch that yz� y<k(m) } yz allows us to compute>SAT

r(t) (F1 , ..., Fr(t)). As in the proof of Theorem 16,

2&c logk(m)m=logk(m) t (b&s)�2.

So, log logk(m) m&c<log(r(t)+1)<Wlog(r(t)+1)X. Sincek(m)�1+- 2�n$, the function r(t) is polynomially bounded.Thus, the number of queries we used to compute>SAT

r(t) (F1 , ..., Fr(t)) violates the relative lower bound ofCorollary 9 and P=NP. K

Remark. Slightly better bounds can be achieved forthe Main Theorem and Theorem 18 in certain cases. Forexample, when k(n)�1+1�no(1) (e.g., k(n)=1+1�log nand k(n)=2), these theorems hold for all c>log(1+1�=).This is accomplished by restricting r(t) in the Construc-tion Lemma by

r(t)�\logk(m)

tb&s(k(m)&1)2

k(m)2 �and showing that for large enough t,

2&c } logk(m) m<logk(m)

tb&s(k(m)&1)2

k(m)2 .

The calculations are somewhat involved and omitted in thispaper.

For several approximation factors, we can restate thehardness results as completeness results. Note that thebounded query classes in the following corollary are closedunder metric reductions and are different unless P=NP.

Corollary 19. For all constants k�1 and all constantsa with 0<a�1:

v (1+n&a)-approximating Clique Size is �Pmt-complete

for PF SAT[O(log n)].3

v (1+1�logk n)-approximating Clique Size is �Pmt-

complete for PF SAT[(k+1) log log n+O(1)].

v k-approximating Clique Size is �Pmt-complete for

PF SAT[log log n+O(1)].

v (logk n)-approximating Clique Size is �Pmt-complete

for PF SAT[log log n&log log log n+O(1)].

Proof. Consider the case of a constant approximationfactor k. Corollary 14 shows that we can k-approximateClique Size using log logkn+O(1) queries to SAT.

Let h be a function which k-approximates Clique Size andtake any f in PFSAT[log log n+q] for some constant q. We wantto show that f �P

mt h, but Theorem 16 only shows that everyfunction PFSAT[log logkn&c]�P

mt-reduces to h. To overcomethe deficit of c+q queries, we use a simple padding trick anddefine a new function f $ to be:

f $(x>1|x|c+q)= f (x).

Now, f $ is a function in PFSAT[log log n&c], so f $�Pmth. Since

f �Pmt f $, it follows that f �P

mth. Thus, approximating CliqueSize within a constant factor is complete under metricreductions for PFSAT[log log n+O(1)].

The other cases are proven using a similar padding argu-ment and the Taylor series approximation ln(1+1�N)r

1�N for large N. K

The results in this section show that bounded queries andapproximating Clique Size are inseparable. That is, you canobtain good approximations of Clique Size if and only ifyou can answer oracle queries to SAT. Moreover, if youwant a closer approximation of Clique Size, then you needmore queries to the oracle. For example, we have shownthat finding log n-approximations of Clique Size is equiv-alent to answering log log n&log log log n+O(1) queriesto SAT and that finding constant factor approximations ofClique Size is equivalent to answering log log n+O(1)queries to SAT.

Completeness, as opposed to lower bounds, is importanthere because completeness implies that any complexity classthat is closed under metric reductions and contains thesetwo problems must also contain the two bounded queryclasses. Thus, any measure of complexity which can dis-tinguish between constant factor approximations and log nfactor approximations must be fine enough to distinguishthe between the number of oracle queries. We view this asevidence that using bounded queries is a natural complexitymeasure for approximation problems.

These results can also be extended to some other NP-approximation problems, most notably to ChromaticNumber. The hardness results follow from the work of Lundand Yannakakis [LY94] which provides a reduction fromClique to Coloring. Thus, for example, we can claim that2-approximating Chromatic Number is �P

mt-complete for

306 RICHARD CHANG

3 This statement can also be derived directly from the self-improvabilityof the Clique problem.

Page 10: On the Query Complexity of Clique Size and Maximum Satisfiability

File: 571J 142510 . By:BV . Date:26:09:96 . Time:09:37 LOP8M. V8.0. Page 01:01Codes: 6033 Signs: 4796 . Length: 56 pic 0 pts, 236 mm

PFSAT[log log n+O(1)] under metric reductions. In the nextsection, we show that the hardness of these approximationproblems provides another benefit. We can now reducemany NP-approximation problems to approximatingClique Size in an approximation preserving manner.

4. REDUCTIONS TO CLIQUE SIZE

The results of the preceding section showed that everyfunction f # PFSAT[O(log n)]�P

mt-reduces to any function hthat k(n)-approximates Clique Size, for some approximationfactor k(n). The more queries it takes to compute f, thesmaller the approximation factor k(n). However, when fitself is also solving an NP-approximation problem,this reduction takes on some approximation preservingcharacteristics. We demonstrate these characteristics byway of an example.

Suppose that we are given a graph G with n vertices andwe want to 2-approximate /(G), the chromatic number of G.This approximation can be found with Wlog Wlog nXX queriesto SAT using binary search. By Corollary 19, 2-approximatingChromatic Number �P

mt-reduces to 2-approximatingClique Size, but this reduction has some additional properties.

Let us examine this reduction more closely. First, thebinary search routine which approximates /(G) does so byasking questions of the form:

Is /(G)�n�2i?

There are Wlog n)X questions of this form. Turn eachquestion into a Boolean formula, Fi , so that Fi # SAT if andonly if /(G)�n�2i. Since this results in a nested sequence, wecan reduce it directly to 2-approximating Clique Size. To dothis, we build a graph H as described in Theorem 16 andcompute y0 , ..., yr , such that 2yi< yi+1 and |(H)= yi �>SAT

r 1 (F1 , ..., Fr)=i. Now, suppose we are given a numberx and are told that it is a 2-approximation of |(H). Then,let yz be the unique yi such that x� yz<2x and we havedetermined that |(H)= yz . So, n�2z is a 2-approximationof /(G). on the other hand, if we are told that x is a4-approximation of |(H), then we cannot determine |(H)exactly. Instead, we could have the situation thatx� yz< yz+1<4x. Again, we use n�2z as an approximationof /(G), but this time it is a 4-approximation. In general wedo not have to be given a guarantee that x is within a certainfactor of |(H). We just find the unique yz such thatx� yz<2x and use n�2z as an approximation of /(G). If xis a k-approximation of |(H), then n�2z is an approximationof /(G) within 2W log k X<2k. Thus, we have the followingtheorem:

Theorem 20. There exist polynomial time computablefunctions T1 and T2 such that for all graphs G and all numbers

x, if x is a k-approximation of |(T1(G)), then T2(G, x) is a2k-approximation of /(G).

Of course, we can replace Chromatic Number with anyNP-approximation problem where the solution is between 1and na. Thus, approximating Clique Size is complete in thesense that all these NP-approximation problems reduce toapproximating Clique Size in an approximation preservingmanner.

5. THE COMPLEXITY OF MAX3SAT

In this section, we examine the complexity of the NP-approximation problem MAX3SAT. As we have mentionedbefore, the MAX3SAT problem does have polynomial timealgorithms which achieve a constant factor approxima-tion. For example, Yannakakis [Yan92] presents a 4�3factor approximation. A consequence of this approximationalgorithm is that the number of queries needed to k(n)-approximate MaxSat is lower than the number of queriesneeded to k(n)-approximate Clique Size. Using Lemma 13,we have:

Corollary 21. Let k(n) be a polynomial time com-putable function such that 1<k(n)<n. Then there exists afunction in PF SAT[ W log log k(n) (4�3) X which k(n)-approximatesMaxSat.

To prove the lower bounds, we need a result fromprobabilistically checkable proofs [ALM+92] which statesthat there exists a constant k such that no polynomial timealgorithm can k-approximate MaxSat unless P=NP. Aspointed out by Khanna et al., [KMSV94] this result can beinterpreted as follows:

Lemmma 22 [KMSV94]. There exist constants $, c1 , c2

and a polynomial time computable function f such that0<$<1, 1�c1 , 1�c2 and given a 3CNF formula F with tclauses, f produces a 3CNF formula F $ with l (t)=c1 tc2

clauses, where

F # SAT O F $ # SAT O MaxSat(F $)=l (t)

F # SAT O MaxSat(F $)=l (t)�(1+$)

Proof Sketch. This proof was originally reported byKhanna, et al. [KMSV94], but we include a sketch of thisproof for the sake of completeness. We start with the verifierfor a probabilistically checkable proof for SAT. The verifierV uses c log n random bits and looks at O(1) bits of theproof. If the given formula F is satisfiable then there existsa proof that causes V to accept with probability 1. If theformula is not satisfiable, then we modify the standardverifier to accept some proof with probability exactly 1�2.Note that in this case, no proof can cause the verifier toaccept with probability greater than 1�2.

307CLIQUE SIZE AND MAXIMUM SATISFIABILITY

Page 11: On the Query Complexity of Clique Size and Maximum Satisfiability

File: 571J 142511 . By:BV . Date:26:09:96 . Time:09:37 LOP8M. V8.0. Page 01:01Codes: 5792 Signs: 3796 . Length: 56 pic 0 pts, 236 mm

Now, the computation of the verifier after each randomstring z has been chosen depends only on the O(1) bits of theproof. This computation can be described by a formula Fz

using Karp's reduction. Since the size of the input to thispart of the verifier's computation is constant, |Fz | is alsobounded by a constant��even as the length of the originalformula F grows.

Let c1 be the number of clauses in Fz . A closer examinationof Karp's reduction will show that if the verifier V using therandom string z rejected, then MaxSat(Fz)=c1&1. (Essen-tially, the only unsatisfiable clause is the one that forces thefinal state to be an accepting state. This condition holdseven when Karp's reduction is modified to give 3CNFformulas.) Now, let F $ be the conjunction of every Fz foreach random string z. Since there are tc2 random strings, F $has l (t)=c1 tc2 clauses. If F is satisfiable, then F $ issatisfiable and MaxSat(F $)=l (t). On the other hand, if F isnot satisfiable, then we can satisfy exactly half of the Fz 's.Thus,

MaxSat(F $)=c1 tc2

2+

(c1&1) tc2

2=l (t)&

l (t)2c1

=l (t)

1+$

by choosing $=1�(2c1&1). K

Using this lemma, Khanna et al. [KMSV94] showedthat every NP-optimization problem which has a constantfactor polynomial time approximation algorithm can bereduced to MAX3SAT by what they call an L-reductionwith scaling. Our results show a quantitative relationshipbetween the complexity of approximating MaxSat andbounded queries. Our results also allow a direct comparisonbetween the complexity of approximating MaxSat andapproximating Clique Size.

Theorem 23. Let q(n) # O(log n) be a non-decreasingpolynomial time computable function. Define the functionk(n) such that q(n)=log logk(n)(1+$$) for some constant0<$$<$. If h is a function which k( } )-approximates MaxSatand f # PFSAT[q(n)], then f �P

mt h.

As in Theorem 16, the proof of Theorem 23 is straight-forward after we prove the following Construction Lemma.

Lemma 24 (Construction Lemma for MAX3SAT). Letk(n) be a polynomial time computable function such that1+n&1�3�k(n)�1+$$, for some constant $$ with 0<$$<$.Let F1 , ..., Fr(t) be a nested sequence with |Fi |=t, m=l (t)2

and

r(t)�wlogk(m)(1+$$)x.

Then, in polynomial time we can construct a 3CNF formulaH with m clauses and whole numbers y0< } } } < yr(t) suchthat k(m)yi< yi+1 for 0�i�r(t)&1, and

MaxSat(H)=yz � >SATr(t) (F1 , ..., Fr(t))=z.

Proof. Let F0 be a known unsatisfiable formula with tclauses. Let F $0 , F $1 , ..., F $r(t) be formulas with l (t) clausesconstructed by applying Lemma 22 to F0 , F1 , ..., Fr(t) . Wewill use F $0 for padding. We construct a formula H which isthe conjunction of l (t) formulas taken from [F $0 , ..., F $r(t)].So, we know that |H|=l (t)2=m.

To simplify our notation, let l=l (t), r=r(t) and g=k(m). We now define a sequence of whole numbers a0 , ..., ar

as follows:

ai={0Wgai&1+l (g&1)�$X+1

if i=0for 1�i�r

The formula H is the conjunction of ai&ai&1 copies of Fi ,for 1�i�r, and l&ar copies of F $0 . We will show belowthat l�ar . For now, let :=$�(1+$) and ;=1�(1+$).That is, $=:�; and for each Fi ,

Fi # SAT O MaxSat(F )=:l+;l

Fi � SAT O MaxSat(F )=;l.

When >SATr (F1 , ..., Fr)=0. MaxSat(H)=;l2 since H is

composed of l formulas from [F $0 , ..., F $r] each with lclauses. In general, if >SAT

r (F1 , ..., Fr)=z, then

MaxSat(H)=:az l+;l2.

Thus, we define yi=:azl+;l2. The fact that k(m)yi�yi+1 follows easily from the definition of ai becauseyi+1&gyi�1.

Proving that ar�l requires a bit more work. We use therelationship

ai� gai&1+l (g&1)

$+2

to show that

ai�\l (g&1)$

+2+ } :i&1

j=1

g j=\l (g&1)$

+2+ } \gi&1g&1 + .

Thus, ar�l when

gr�1+$&2$2

l (g&1)+2$. (4)

Since k(m)�1+n&1�3 and l=- m, it follows thatl (g&1)�m1�6. Therefore, the last term in the right handside of Equation 4 approaches zero as t gets larger. Thus,Equation 4 is satisfied by the requirement that for some$$<$,

r(t)�wlogk(m)(1+$$)x.

308 RICHARD CHANG

Page 12: On the Query Complexity of Clique Size and Maximum Satisfiability

File: 571J 142512 . By:BV . Date:26:09:96 . Time:09:37 LOP8M. V8.0. Page 01:01Codes: 6292 Signs: 5143 . Length: 56 pic 0 pts, 236 mm

Corollary 25. For all constants k�1 and all constantsa with 0<a�1:

v (1+n&a)-approximating MaxSat is �Pmt -complete for

PFSAT[0(log n)].

v (1+1�logkn)-approximating MaxSat is �Pmt -complete

for PFSAT[k log log n+O(1)].

As in Theorem 18, we can use the Construction Lemmafor MAX3SAT to derive a lower bound on the number ofqueries needed to approximate MaxSat. Note that theupper bound of log logk(n)(4�3) queries and the lower boundof log logk(n)(1+$$) queries given below differ only by aconstant: log log1+$$(4�3). This difference represents the factthat for k between (1+$) and 4�3, we do not know if thereexist polynomial time algorithms that k-approximateMaxSat. If such algorithms exist then we can reduce theupper bound.

Theorem 26. Let k(n) be polynomial time computablesuch that 1+n&1�3�k(n)�1+$$ for some constant $$with 0<$$<$. If there exists a polynomial time com-putable function which k( } )-approximates MaxSat usinglog logk(n)(1+$$) or fewer queries to any oracle X, thenP=NP.

Proof. Analogous to the proof of Theorem 18.

The results in this section can be extended to cover anyMAXNP-complete and MAXSNP-complete problem, sinceevery MAXNP problem has a constant factor polynomialtime approximation algorithm and MAX3SAT reduces tothese problems by L-reductions. In the next section, wediscuss the relationship between bounded queries and self-improvability.

6. QUERIES AND SELF-IMPROVEMENT

In this section, we explore some connections between thestructure of bounded query classes and the self-improv-ability of NP-approximation problems. An NP-approxima-tion problem is self-improvable if for all constants k1 andk2 , with k1<k2 , k1 -approximating the problem can bereduced to any function that k2-approximates the problem.Intuitively, when a function is self-improvable, we canreducea finerapproximationproblemto a coarserapproxima-tion problem for constant factor approximations.

The Clique Size problem is known to be self-improvable[GJ79]. For example, we can reduce 2-approximatingClique Size to 4-approximating Clique Size as follows.Given a graph G with n vertices, construct a graph G$ withn2 vertices such that |(G$)=|(G)2. Now if a number x4-approximates |(G$), - x would 2-approximate |(G).

Our first observation is that the self-improving reductionabove does not really reduce the number of queries needed

to 2-approximate |(G). This is because it takes roughlylog log4(n2) queries to SAT to 4-approximate |(G$).However, log log4(n2)=log log n and using log log nqueries we can 2-approximate |(G) directly. So, self-improving reductions are not query saving reductions.

Secondly, we point out that the self-improvementproperty of Clique Size can be deduced from the hardness ofClique Size shown in Theorem 16. Suppose that k1 and k2

are two constant approximation factors where k1<k2 . Letlog log n+q1 be the number of queries needed to k1 -approximate Clique Size using binary search. Also, let q2 bethe constant from Theorem 16 such that every functionin PFSAT[log log n&q2]�P

mt-reduces to any function thatk2 -approximates Clique Size. By a simple paddingargument, every function in PFSAT[log log n+q1]�P

mt-reducesto a function in PFSAT[log log n&q2]. Thus, we can reducek1 -approximating Clique Size to any function whichk2 -approximates Clique Size.

In contrast, if MaxSat were self-improvable, then we canuse Yannakakis's algorithm to k-approximate MaxSat forevery constant k. By the lower bounds in Section 5, thiswould imply that P=NP. Again, the fact that MaxSat isnot self-improvable can be deduced from its querycomplexity. The padding argument for the self-improvability of Clique Size works because stretching theinput size increases the number of queries available to aPFSAT[log log n] machine. For example, squaring the inputsize makes 1 more query available. On the other hand, weonly use a constant number of queries to k-approximateMaxSat. Since the number of queries does not depend onthe input size, padding the input will not increase thenumber of queries available. This is also the reason why weonly have a hardness result, and not a completeness result,on the query complexity of n1�a-approximating Clique Size.

Thus, we can often deduce the self-improvability of anNP-approximation problem after we have discovered itsquery complexity. For example, we can now show thatChromatic Number and similar problems have the self-improvement property.4

Lemma 27. The NP-approximation problems ChromaticNumber, Clique Partition, Clique Cover, and Biclique Coverhave the self -improvement property. That is, given constantsk1 and k2 , with k1<k2 , and a function h that k2 -approxi-mates the given problem, there exists a function f whichk1 -approximates the problem such that f �P

mt , h.

Proof. To prove this lemma, we use the results of Lundand Yannakakis [LY94] which showed an approximationpreserving reduction from Clique Size to each of the given

309CLIQUE SIZE AND MAXIMUM SATISFIABILITY

4 These self-improvability results can be derived from previous work[PR90, LY94, KMSV94]. Our main point here is the relationship betweenbounded query classes and structural properties of NP-approximationproblems.

Page 13: On the Query Complexity of Clique Size and Maximum Satisfiability

File: 571J 142513 . By:BV . Date:26:09:96 . Time:09:37 LOP8M. V8.0. Page 01:01Codes: 6090 Signs: 4832 . Length: 56 pic 0 pts, 236 mm

problems. Thus, for constant k, k-approximating each ofthese problems is complete for PFSAT[log log n+O(1)]. Hence,by the preceding discussion, these problems are self-improvable. K

7. CLIQUE SIZE VERSUS MAXSAT

In this section, we compare the complexity of appro-ximating Clique Size and approximating MaxSat usingbounded queries as a quantitative complexity measure.We use these comparisons to argue that the approximabi-lity of MaxSat is not an indication that 3SAT is ``easier''than Clique. Instead, it is an indication that MaxSat has alot of ``fluff'' that is approximable. We argue thiscase by showing that if we add a linear amount of paddingto Clique Size, then the resulting problem has a similar com-plexity to MaxSat. We also show that finding goodapproximations to the ``non-approximable'' part of MaxSatrequires just as many queries as approximating CliqueSize.

In our first comparison, we compare the query com-plexities of approximating Clique Size and MaxSat directly.By Corollaries 19 and 25, both 2-approximating Clique Sizeand (1+1�log n)-approximating MaxSat are complete forPFSAT[log log n+O(1)]. Thus these two problems are inter-reducible. For finer approximations, we note that for bothproblems, obtaining a (1+n&a)-approximation for con-stant a, 0<a<1 is complete for PFSAT[O(log n)]. Thus, ineither case, approximating within a factor of 1+n&a is �P

mt-equivalent to finding the optimal solution. For coarserapproximations, note that approximating MaxSat within aconstant factor requires only a constant number of queries.On the other hand, k-approximating Clique Size is completefor log log n+O(1) queries. Thus, for constant factorapproximations the complexity of approximating CliqueSize and approximating MaxSat diverges. This canbe explained by examining the upper bound given inLemma 13. The number of queries used to find an k(n)-approximation of OPT6 is the following (assuming there isa polynomial time algorithm which A(n)-approximatesOPT6):

log log A(n)&log log k(n).

When k(n)�2, the log log A(n) term is significant.However, when k(n) diminishes to 1 (e.g., k(n)=1+1�log n), the second term tends to dominate. For example,if we write k(n) as 1+1� f (n) for some unbounded f (n), wecan use the Taylor series approximation ln(1+1�N)r1�Nfor large N and rewrite the number of queries to beapproximately:

log ln A(n)+log f (n).

For large f (n), the log f (n) term dominates the sum. Thisrelationship explains why computing the |(G) andMaxSat(F ) exactly have the same complexity, but findingconstant factor approximations of these two problems havedifferent complexities.

Another method of comparing the complexities ofMaxSat and Clique Size is to alter the Clique Size problemso we can match the upper and lower bounds of MaxSat. Todo this, we define an admittedly contrived problem: PaddedClique Size. An instance of this problem is a paddedgraph��one that has a large obvious clique. To produce apadded graph from a general graph G with n vertices we canattach a 3n-clique to G. The resulting graph G$ will have 4nvertices and |(G$)=|(G)+3n. So, the number 3n is alwaysa 4�3-approximation of |(G$). Then, by altering Lemma 15,we can find constants =1 , =2 , with 3�4<=1<=2<1, and apolynomial time computable function f such that given any3CNF formula F, f produces a padded graph G$ with nvertices with the property that:5

F # SAT O |(G$)==2n and

F � SAT O |(G$)==1n.

Thus, no polynomial time algorithm can (=2 �=1)-approxi-mate Padded Clique Size, unless P=NP. In fact, everyfunction in PFSAT[log a]�P

mt -reduces to a function h that(=2 �=1)1�a-approximates Padded Clique Size. Therefore, thecomplexities of approximating MaxSat and Padded CliqueSize are very similar.

The preceding discussion shows that the traditionalapproach of using approximation factors to measure thequality of an approximation is highly sensitive to padding.We wonder if at its core, the complexity of approximatingMaxSat is equivalent to the complexity of approximatingClique Size. For example, let F $ be a formula produced bythe reduction in Lemma 22. This formula has m clauses, butwe know a priori that at least m�(1+$) clauses aresimultaneously satisfiable. Thus, if we want a function toguarantee good approximations, it would make sense to askfor the approximation x to guarantee that

MaxSat(F $)&m

1+$<k(m) } \x&

m1+$+ ,

310 RICHARD CHANG

5 Here we do not want to use the graphs generated by the function inLemma 15. These graphs have td vertices and cliques of size tb or ts, so theratio tb�td diminishes to zero. Instead, we will take the graph constructedfrom the probabilistically checkable proof for SAT which uses O(log n)random bits and a constant number of query bits. When F # SAT, thegraph constructed from these probabilistically checkable proofs havecliques with a constant fraction of the vertices.

Page 14: On the Query Complexity of Clique Size and Maximum Satisfiability

File: 571J 142514 . By:BV . Date:26:09:96 . Time:09:37 LOP8M. V8.0. Page 01:01Codes: 6250 Signs: 5151 . Length: 56 pic 0 pts, 236 mm

instead of simply asking for MaxSat(F $)<k(m)x. That is,we would consider only the number of clauses in excess ofthe number we already know to be satisfiable. Thecomplexity of finding an approximation x with such aguarantee would require roughly log logk(m) m queries��essentially the same as approximating Clique Size. Notethat the value of $ in Lemma 22 is constructible and can bededuced from the size of the verifier in the probabilisticallycheckable proofs for SAT. The difficulty of generalizing thisapproach is that it would be difficult, if not impossible, tofind the value of ``$'' for general 3CNF formulas.6 So, thismeasure of the quality of an approximation of MaxSat(F )would only make sense for the formulas produced by thereduction in Lemma 22. Nevertheless, this example suggeststhat approximating the ``hard part'' of MaxSat(F ) is just asdifficult as approximating Clique Size. In the next section,we show that if we treat the approximation version of 3SATas a minimization problem instead of a maximizationproblem, then approximating 3SAT has the samecomplexity as approximating Clique Size.

8. MINIMIZING UNSATISFIABILITY

The optimization version of the satisfiability problem isusually defined as a maximization problem: if we cannotfind an assignment that satisfies all of the clauses, then weshould at least try to find one that satisfies the largestnumber of clauses. This optimization problem has an equiv-alent minimization analog: finding an assignment thatminimizes the number of unsatisfied clauses. We call thisproblem MIN3UNSAT. Note that the solution space ofMAX3SAT and MIN3UNSAT are identical��they aretruth assignments to the variables of the given Booleanformula. In fact, a better solution to the MIN3UNSATproblem is a better solution to the MAX3SAT problem. Theonly difference between the two problems is the cost whichwe associate with each feasible solution. For example, takea formula F with 120 clauses, let solution A be an assignmentwhich satisfies 100 clauses and solution B be an assignmentwhich satisfies 110 clauses. It would be typical to say thatsolution B is a 10 percent improvement over solution A.However, from the point of view of minimizing the numberof unsatisfied clauses, solution B leaves only 10 clausesunsatisfied while solution A leaves 20 clauses unsatisfied. Inthis measure, solution B is twice as good as solution A.However, both measures regard solution B as the bettersolution. Only the quantitative ratio between the solutionschanges from one measure to another.

In this section we show that the complexity of appro-ximating MIN3UNSAT is equivalent to the complexity of

approximating Clique Size. As a result, we can concludethat the approximability of MAX3SAT is not due to a lowerintrinsic complexity of 3SAT, but is an artifact of our choiceof viewing satisfiability as a maximization problem.

Definion 28. Let F be a 3CNF formula with n clauses.We define MinUnsat(F ) to be the minimum number ofunsatisfied clauses resulting from any assignment of truthvalues to the variables of F. A number x k(n)-approximatesMinUnsat(F ) if x�k(n)<MinUnsat(F )<x. In the specialcase where MinUnsat(F )=0, we will allow 0 to be ak(n)-approximation of MinUnsat(F ). A function f k( } )-approximates MIN3UNSAT if f (F ) k( |F | )-approximatesMinUnsat(F ) for all formulas F.

For a 3CNF formula F with n clauses, we know thatMinUnsat(F )�n�2, because given any assignment of truthvalues to the variables in F, either the assignment or itscomplement will satisfy half of the clauses in F. So, we knowthat MinUnsat(F ) ranges from 0 to n�2. We can handle thespecial case where MinUnsat(F )=0 by making 0 asingleton interval. Thus, using Lemma 13 we can producean upper bound on the number of queries needed toapproximate MinUnsat.

Corollary 29. Let k(n) be polynomial time computablesuch that 1<k(n)<n�2. Then there exists a function inPF SAT[q(n)] which k( } )-approximates MinUnsat, whereq(n)=Wlog Wlogk(n)(n�2+1)+1XX.

Theorem 30. Let q(n) # O(log n) be a non-decreasingpolynomial time computable function. Define k(n) such thatq(n)=log logk(n) n&2. If h is a function which k( } )-approximates MinUnsat and f # PF SAT[q(n)], then f �P

mth.

As before, the key to the theorem is a construction lemma.For this lemma, we need to translate Lemma 22 to thecontext of minimizing unsatisfied clauses.

Corollary 31. There exist constants :, c1 , c2 and apolynomial time computable function f such that 0<:<1,1�c1 , 1�c2 and given a 3CNF formula F with t clauses, fproduces a 3CNF forsnula F $ with l (t)=c1 tc2 clauses, where

F # SAT O F $ # SAT O MinUnsat(F $)=0

F � SAT O MinUnsat(F $)=:l(t).

Remark. The proof of Theorem 30 exploits the fact thatMinUnsat(F $) may be 0 to simplify the combinatorialanalysis. However, this convenient feature is not an essentialelement of the proof. Similar theorems can be proven evenif we restrict F $ to be unsatisfiable or modified the objectivefunction so that MinUnsat(F $)�1.

311CLIQUE SIZE AND MAXIMUM SATISFIABILITY

6 In general, if F is a 3CNF formula with n clauses, we know that at leastn�2 of the clauses are simultaneously satisfiable.

Page 15: On the Query Complexity of Clique Size and Maximum Satisfiability

File: 571J 142515 . By:BV . Date:26:09:96 . Time:09:37 LOP8M. V8.0. Page 01:01Codes: 6300 Signs: 4977 . Length: 56 pic 0 pts, 236 mm

Lemma 32 (Construction Lemma for MIN3UNSAT).Let k(n) be polynomial time computable such that 1+2n&1�4

�k(n)�n�2 and let F1 , ..., Fr(t) be a nested sequence where|Fi |=t, m=l (t)2 and

r(t)�w(logk(m) m)�4x .

Then, in polynomial time we can construct a 3CNF formulaH with m clauses and whole numbers y0> y1> } } } > yr(t)

such that yi>k(m) yi+1 for 0�i�r(t)&1, and

MinSat(H)= yz � >SATr(t) (F1 , ..., Fr(t))=z.

Proof. This proof is analogous to the construction inLemma 24. One main difference is that MIN3UNSAT is aminimization problem. To construct H, let F0 be anobviously satisfiable formula with t clauses. We take each Fi

and use Lemma 31 to construct a formula F $i with l (t)clauses. As usual we let l=l (t), r=r(t) and g=k(m). Wedefine

ai={0Wgai&1 X+1

if i=0for 1�i�r

The combined formula H is the conjunction of l&ar copiesof F $0 , ar&ar&1 copies of F $1 , ar&1&ar&2 copies of F $2 , ...and one copy of F $r . Now, if all of the formulas in F1 , ..., Fr

are satisfiable, then MinUnsat(H)=0. If Fr is the onlyunsatisfiable formula, then we know that MinUnsat(H)=:l. In general, if >SAT

r (F1 , ..., Fr)=r&i, then MinUnsat(H)=:ai l. So, we can define yr&i to be :ai l, for 0�i�r. Therequirement that yi> gyi+1 follows directly from thedefinition of ai+1 . Furthermore, the restriction thatk(m)�1+2n&1�4 and r�w(logk(m) m)�4x guarantees thatar�l. We omit the calculations here. K

Corollary 33. Let k(n) be polynomial time computablesuch that 1+2n&1�4�k(n)�n. If there exists a polynomialtime function which k( } )-approximates MinUnsat(F ) usinglog logk(n) n&2 queries to an oracle X for all 3CNF formulasF with n clauses, then P=NP.

The results in this section show that the complexity ofapproximating MinUnsat is very close to the complexity ofapproximating Clique Size. The difference between thenumber of queries only differ by a constant. For classesclosed under �P

mt-reductions, approximating MinUnsat hasthe same complexity as approximating Clique Size.

Corollary 34. For all constants k�1 and all constantsa with 0<a�1:

v (1+n&a)-approximating MinUnsat is �Pmt -complete

for PF SAT[O(log n)].

v (1+1�logk n)-approximating MinUnsat is �Pmt-comp

lete for PF SAT[(k+1) log log n+O(1)].

v k-approximating MinUnsat is �Pmt -complete for

PF SAT[log log n+O(1)].

v (logk n)-approximating MinUnsat is �Pmt -complete for

PF SAT[log log n&log log log n+O(1)].

Thus, approximating Clique Size within a constant factorreduces to approximating MinUnsat within a constantfactor, and vice versa. Furthermore, by the discussion inSection 4, these reductions are ``approximation preserving.''Hence, we can argue that approximating Clique Size andapproximating MinUnsat really have the same complexity.Thus, the ``approximability'' MAX3SAT is not an indicationthat 3SAT has an intrinsically lower complexity thanClique. It is merely an artifact of viewing MAX3SAT as amaximization problem instead of a minimization problem.We prefer to think of MAX3SAT and MIN3UNSAT ascomplementary views of the same approximation problem.

There are other examples of such complementary NP-approximation problems. For a graph G=(V, E), V$ is avertex cover if and only if V&V$ is an independent set[GJ79]. An approximate solution to the vertex coverproblem leads immediately to an approximate solution toindependent set (which is equivalent to Clique) and viceversa. Thus, approximating the size of the smallest vertexcover has the same complexity as approximating MaxSatand approximating the size of the largest independent sethas the same complexity as approximating Clique Size andMinUnsat. Finally, it is interesting to point out that findingthe optimal solutions to these problems are all completefor PFSAT[O(log n)] and are thus equivalent under �P

mt-reductions [Kre88].

9. DISCUSSION

The results in this paper along with the results in Chang,Gasarch and Lund [CGL94] show that counting queries isa good measure of the complexity of NP-approximationproblems. We have derived close upper bounds and relativelower bounds on the number of queries needed to solvevarious NP-approximation problems. Furthermore, thecompleteness results show that queries to SAT and findingapproximate solutions to these NP-approximation problemsare equivalent. That is, with more queries we can find bettersolutions, and if we can find better solutions then we cananswer more queries. In addition, we can quantify therelationship between the approximation factor and thenumber of queries. This quantitative relationship allows usto deduce some structural results about the NP-approxima-tion problems��namely, self-improvability and inter-reducibility between NP-approximation problems. Finally,using a uniform and quantitative complexity measure hasallowed us to compare the difficulty of approximating cliquesize and satisfiability. We have shown that whether oneprefers to view these two problems as having equivalentcomplexity (or whether one prefers the conventionalwisdom that maximum clique is more difficult than

312 RICHARD CHANG

Page 16: On the Query Complexity of Clique Size and Maximum Satisfiability

File: 571J 142516 . By:BV . Date:25:09:96 . Time:14:40 LOP8M. V8.0. Page 01:01Codes: 7012 Signs: 6050 . Length: 56 pic 0 pts, 236 mm

satisfiability) does not depend upon the intrinsic complexityof these problems. Instead, it depends on whether oneprefers to treat satisfiability as a maximization problem ora minimization problem.

Further studies in this area have expanded theapplicability of using bounded queries as a complexitymeasure for NP-approximation problems. Crescenzi, Kann,Silvestri and Trevisan [CKST95] have considered whetherthere exists a reduction from finding the vertices of thelargest clique to a function that merely finds the vertices ofa 2-approximate clique. They have shown that if such areduction exists, then the Polynomial Hierarchy mustcollapse. These results must deal with not just finding anapproximation of the size of the largest clique, but alsoproducing the vertices in the approximate clique. It turnsout that the quantitative results in this paper can also beextended to the domain of finding approximate solutionsto an NP-optimization problem 6 instead of simplyapproximating the cost of the optimal solution OPT6 . Forexample, an NP machine with Wlog Wlog nXX queries to SATin its entire computation tree can find the vertices ofa 2-approximate clique. Moreover, every multi-valuedfunction computed by such NP machines can be reduced tofinding the vertices of a 2-approximate clique. Furthermore,one can show that an NP machine with additional queriesto SAT can compute more functions unless PH collapses.These theorems depend upon results about the BooleanHierarchy and is beyond the scope of this paper. Weencourage the reader to consult the literature on expositionsof these recent developments [Cha94b].

ACKNOWLEDGMENTS

The author thanks Bill Gasarch, Carsten Lund, Madhu Sudan, SureshChari, Christine Piatko, and Ming Li for several fruitful discussions.In addition many thanks go to Richard Beigel, Jacobo Tora� n andLuca Trevisan for suggestions in revising this paper. Finally, the authoracknowledges the journal referee for several simplifications in the proof ofthe main theorem.

REFERENCES

[ABG90] A. Amir, R. Beigel, and W. I. Gasarch, Some connectionsbetween bounded query classes and non-uniform complexity,in ``Proceedings of the 5th Structure in Complexity TheoryConference, 1990,'' pp. 232�243.

[AG88] A. Amir and W. I. Gasarch, Polynomial terse sets, Inform.and Comput. 77 (Apr. 1988), 37�56.

[ALM+92] S. Arora, C. Lund, R. Motwani, M. Sudan, and M. Szegedy,Proof verification and hardness of approximation problems,in ``Proceedings of the IEEE Symposium on Foundations ofComputer Science, 1992,'' pp. 14�23.

[Bei88] R. Beigel, ``NP-Hard Sets Are p-Superterse unless R=NP,''Tech. Rep. 4, Department of Computer Science, The JohnsHopkins University, 1988.

[Bei91] R. Beigel, Bounded queries to SAT and the Boolean hierarchy,Theoret. Comput. Sci. 84, No. 2 (July 1991), 199�223.

[BGLR93] M. Bellare, S. Goldwasser, C. Lund, and A. Russell, Efficientprobabilistically checkable proofs and applications toapproximations, in ``ACM Symposium on Theory ofComputing, 1993,'' pp. 294�304.

[CG93] R. Chang and W. I. Gasarch, On bounded queries andapproximation, in ``Proceedings of the IEEE Symposium onFoundations of Computer Science, 1993,'' pp. 547�556.

[CGH+88] J. Cai, T. Gundermann, J. Hartmanis, L. Hemachandra,V. Sewelson, K. Wagner, and G. Wechsung, The Booleanhierarchy Structural properties, SIAM J. Comput. 17, No. 6(Dec. 1988), 1232�1252.

[CGL94] R. Chang, W. I. Gasarch, and C. Lund, ``On BoundedQueries and Approximation,'' Techn. Rep. TR CS-94-05,Department of Computer Science, University of MarylandBaltimore County, Apr. 1994; SIAM J. Computing, toappear.

[Cha94a] R. Chang, On the query complexity of clique size andmaximum satisfiability, in ``Proceedings of the 9thStructure in Complexity Theory Conference, June 1994,''pp. 31�42.

[Cha94b] R. Chang, Structural complexity column: A machine modelfor NP-approximation problems and the revenge of theBoolean hierarchy, Bull. Europ. Assoc. Theoret. Comput. Sci.54 (Oct. 1994), 166�182.

[Cha95] R. Chang, ``On the Query Complexity of Clique Size andMaximum Satisfiability,'' Tech. Rep. TR CS-95-01,Department of Computer Science, University of MarylandBaltimore County, May 1995.

[CKST95] P. Crescenzi, V. Kann, R. Silvestri, and L. Trevisan, Structurein approximation classes, in ``Proceedings of the 1st Computingand Combinatorics Conference, August 1995.''

[GJ79] M. R. Garey and D. S. Johnson, ``Computers andIntractability: A Guido to the Theory of NP-Completeness,''Freeman, San Fransico, 1979.

[HN93] A. Hoene and A. Nickelsen, Counting, selecting, sorting byquery-bounded machines, in ``Proceedings of the 10thSymposium on Theoretical Aspects of Computer Science,''Lecture Notes in Computer Sciences, Vol. 665, Springer-Verlag, 1993.

[KMSV94] S. Khanna, R. Motwani, M. Sudan, and U. Vazirani, Onsyntactic versus computational views of approximability, in``Proceedings of the IEEE Symposium on Foundations ofComputer Science, 1994,'' pp. 819�830.

[Kre88] M. W. Krentel, The complexity of optimization problems,J. Comput. System Sci. 36, No. 3 (1988), 490�509.

[LY94] C. Lund and M. Yannakakis, On the hardness ofapproximating minimization problems, J. ACM 40, No. 5(Sept. 1994), 960�981.

[PR90] A. Panconesi and D. Ranjan, Quantifiers and approximation,in ``ACM Symposium on Theory of Computing, 1990,''pp. 446�456.

[PY91] C. H. Papadimitriou and M. Yannakakis, Optimization,approximation, and complexity classes, J. Comput. SystemSci. 43 (1991), 425�440.

[Yan92] M. Yannakakis, On the approximation of maximumsatisfiability, in ``Proceedings of the 3rd Symposium onDiscrete Algorithms, 1992,'' pp. 1�9.

[Zuc93] D. Zuckerman, NP-complete problems have a version that'shard to approximate, in ``Proceedings of the 8th Structure inComplexity Theory Conference, 1993,'' pp. 305�312.

Printed in Belgium

313CLIQUE SIZE AND MAXIMUM SATISFIABILITY