computational lower bounds for statistical...
TRANSCRIPT
ComputationalLowerBoundsforStatisticalEstimationProblems
IliasDiakonikolas(USC)
(jointwithDanielKane(UCSD)andAlistairStewart(USC))
WorkshoponLocalAlgorithms,MIT,June2018
THISTALK
GeneralTechniqueforStatisticalQueryLowerBounds:LeadstoTightLowerBounds
forarangeofHigh-dimensionalEstimationTasks
ConcreteApplicationsofourTechnique:
• LearningGaussianMixtureModels(GMMs)
• RobustlyLearningaGaussian
• RobustlyTestingaGaussian
• Statistical-ComputationalTradeoffs
STATISTICALQUERIES[KEARNS’93]
𝑥", 𝑥$, … , 𝑥& ∼ 𝐷 over𝑋
STATISTICALQUERIES[KEARNS’93]
𝑣" − 𝐄-∼. 𝜙" 𝑥 ≤ 𝜏𝜏 istoleranceofthequery;𝜏 = 1/ 𝑚�
𝜙7
𝑣"𝜙$𝑣$
𝑣7SQalgorithmSTAT.(𝜏) oracle
𝐷
𝜙": 𝑋 → −1,1
Problem𝑃 ∈ SQCompl 𝑞,𝑚 :IfexistsaSQalgorithmthatsolves𝑃 using𝑞 queriestoSTAT.(𝜏 = 1/ 𝑚� )
POWEROFSQ ALGORITHMSRestrictedModel:Hopetoproveunconditionalcomputationallowerbounds.
PowerfulModel:WiderangeofalgorithmictechniquesinMLareimplementableusingSQs*:
• PACLearning:AC0,decisiontrees,linearseparators,boosting.
• UnsupervisedLearning:stochasticconvexoptimization,moment-basedmethods,k-meansclustering,EM,…[Feldman-Grigorescu-Reyzin-Vempala-Xiao/JACM’17]
Onlyknownexception:Gaussianeliminationoverfinitefields(e.g.,learningparities).
Forallproblemsinthistalk,strongestknownalgorithmsareSQ.
METHODOLOGYFORSQ LOWERBOUNDSStatisticalQueryDimension:
• Fixed-distributionPACLearning[Blum-Furst-Jackson-Kearns-Mansour-Rudich’95;…]
• GeneralStatisticalProblems[Feldman-Grigorescu-Reyzin-Vempala-Xiao’13,…,Feldman’16]
PairwisecorrelationbetweenD1 andD2 withrespecttoD:
Fact:Sufficestoconstructalargesetofdistributionsthatarenearlyuncorrelated.
THISTALK
GeneralTechniqueforStatisticalQueryLowerBounds:LeadstoTightLowerBounds
forarangeofHigh-dimensionalEstimationTasks
ConcreteApplicationsofourTechnique:
• LearningGaussianMixtureModels(GMMs)
• RobustlyLearningaGaussian
• RobustlyTestingaGaussian
• Statistical-ComputationalTradeoffs
GAUSSIANMIXTUREMODEL(GMM)
• GMM:Distributiononwithprobabilitydensityfunction
• ExtensivelystudiedinstatisticsandTCS
KarlPearson(1894)
LEARNINGGMMS- PRIORWORK(I)
TwoRelatedLearningProblemsParameterEstimation:Recovermodelparameters.
• SeparationAssumptions:Clustering-basedTechniques[Dasgupta’99,Dasgupta-Schulman’00,Arora-Kanan’01,Vempala-Wang’02,Achlioptas-McSherry’05,Brubaker-Vempala’08]
SampleComplexity:(BestKnown)Runtime:
• NoSeparation:MomentMethod[Kalai-Moitra-Valiant’10,Moitra-Valiant’10,Belkin-Sinha’10,Hardt-Price’15]
SampleComplexity:(BestKnown)Runtime:
SEPARATIONASSUMPTIONS
• Clusteringispossibleonlywhenthecomponentshaveverylittleoverlap.
• Formally,wewantthetotalvariationdistancebetweencomponentstobecloseto1.
• AlgorithmsforlearningsphericalGMMSworkunderthisassumption.
• Fornon-sphericalGMMs,knownalgorithmsrequirestrongerassumptions.
LEARNINGGMMS- PRIORWORK(II)
DensityEstimation:Recoverunderlyingdistribution(withinstatisticaldistance).
[Feldman-O’Donnell-Servedio’05,Moitra-Valiant’10,Suresh-Orlitsky-Acharya-Jafarpour’14,Hardt-Price’15,Li-Schmidt’15]
SampleComplexity:
(BestKnown)Runtime:
Fact:ForseparatedGMMs,densityestimationandparameterestimationareequivalent.
LEARNINGGMMS– OPENQUESTION
Summary:Thesamplecomplexityofdensityestimationfork-GMMsis.Thesamplecomplexityofparameterestimationforseparated k-GMMsis.
Question:Isthereatime learningalgorithm?
STATISTICALQUERYLOWERBOUNDFORLEARNINGGMMS
Theorem:Supposethat.AnySQalgorithmthatlearnsseparatedk-GMMsovertoconstanterrorrequireseither:• SQqueriesofaccuracy
or• Atleast
manySQqueries.
Take-away: ComputationalcomplexityoflearningGMMsisinherentlyexponentialindimensionoflatentspace.
GENERALRECIPEFOR(SQ)LOWERBOUNDS
OurgenerictechniqueforprovingSQLowerBounds:
� Step#1:ConstructdistributionthatisstandardGaussianinalldirectionsexcept.
� Step#2:Constructtheunivariateprojectioninthedirectionsothatitmatchesthefirstm momentsof
� Step#3:Considerthefamilyofinstances
HIDDENDIRECTIONDISTRIBUTION
Definition: Foraunitvectorv andaunivariatedistributionwithdensityA,considerthehigh-dimensionaldistribution
Example:
GENERICSQLOWERBOUND
Definition: Foraunitvectorv andaunivariatedistributionwithdensityA,considerthehigh-dimensionaldistribution
Proposition:Supposethat:• A matchesthefirstm momentsof• Wehaveaslongasv, v’ arenearly
orthogonal.
ThenanySQalgorithmthatlearnsanunknownwithinerrorrequireseitherqueriesofaccuracyormanyqueries.
WHYISFINDINGAHIDDENDIRECTIONHARD?
Observation:Low-DegreeMomentsdonothelp.
• A matchesthefirstm momentsof• Thefirstm momentsofareidenticaltothoseof• Degree-(m+1) moment tensor has entries.
Claim:Randomprojectionsdonothelp.
• Todistinguishbetweenand,wouldneedexponentiallymanyrandomprojections.
ONE-DIMENSIONALPROJECTIONSAREALMOSTGAUSSIAN
KeyLemma:LetQ bethedistributionof,where.Then,wehavethat:
PROOFOFKEYLEMMA(I)
PROOFOFKEYLEMMA(I)
PROOFOFKEYLEMMA(II)
where istheoperatorover
GaussianNoise(Ornstein-Uhlenbeck)Operator
EIGENFUNCTIONS OFORNSTEIN-UHLENBECK OPERATOR
LinearOperator actingonfunctions
Fact(Mehler’66):
• denotesthedegree-i Hermite polynomial.• Notethatareorthonormalwithrespect
totheinnerproduct
GENERICSQLOWERBOUND
Definition: Foraunitvectorv andaunivariatedistributionwithdensityA,considerthehigh-dimensionaldistribution
Proposition:Supposethat:• A matchesthefirstm momentsof• Wehaveaslongasv, v’ arenearly
orthogonal.
ThenanySQalgorithmthatlearnsanunknownwithinerrorrequireseitherqueriesofaccuracyormanyqueries.
PROOFOFGENERICSQ LOWERBOUND• Sufficestoconstructalargesetofdistributionsthatare
nearly uncorrelated.• PairwisecorrelationbetweenD1 andD2 withrespectto
D:
TwoMainIngredients:
CorrelationLemma:
PackingArgument:ThereexistsasetS ofunitvectorsonwithpairwiseinnerproduct
Theorem:AnySQalgorithmthatlearnsseparatedk-GMMsovertoconstanterrorrequireseitherSQqueriesofaccuracyoratleastmanySQqueries.
APPLICATION:SQ LOWERBOUNDFORGMMS (I)
Wanttoshow:
byusingourgenericproposition:
Proposition:Supposethat:• A matchesthefirstm momentsof• Wehaveaslongasv, v’ arenearly
orthogonal.
ThenanySQalgorithmthatlearnsanunknownwithinerrorrequireseitherqueriesofaccuracyormanyqueries.
APPLICATION:SQ LOWERBOUNDFORGMMS (II)Lemma:ThereexistsaunivariatedistributionA thatisak-GMMwithcomponentsAi such that:• A agreeswithonthefirst2k-1 moments.• Eachpairofcomponentsareseparated.• Wheneverv andv’ arenearlyorthogonal
APPLICATION:SQ LOWERBOUNDFORGMMS (III)High-DimensionalDistributionslooklike“parallelpancakes”:
Efficientlylearnablefork=2. [Brubaker-Vempala’08]
FURTHERRESULTS
SQLowerBounds:• LearningGMMs• RobustlyLearningaGaussian
“Errorguaranteeof[DKK+16]areoptimalforallpolytimealgorithms.”• RobustCovarianceEstimationinSpectralNorm:
“AnyefficientSQalgorithmrequiressamples.”• Robustk-SparseMeanEstimation:
“AnyefficientSQalgorithmrequiressamples.”
SampleComplexityLowerBounds• RobustGaussianMeanTesting• TestingSpherical2-GMMs:“Distinguishingbetweenandrequiressamples.”• SparseMeanTesting
Unifiedtechniqueyieldingarangeofapplications.
SAMPLECOMPLEXITYOFROBUSTTESTINGHigh-DimensionalHypothesisTesting
GaussianMeanTestingDistinguishbetween:• Completeness:• Soundness:with
Simplemean-basedalgorithmwithsamples.
Supposeweaddcorruptionstosoundnesscaseatrate.
TheoremSamplecomplexityofrobustGaussianmeantestingis.
Take-away: Robustnesscandramaticallyincreasethesamplecomplexityofanestimationtask.
SUMMARYANDFUTUREDIRECTIONS
• GeneralTechniquetoProveSQLowerBounds
• ImplicationsforaRangeofUnsupervisedEstimationProblems
FutureDirections:
• FurtherApplicationsofourFrameworkDiscreteSetting[D-Kane-Stewart’18],RobustRegression[D-Kong-Stewart’18],AdversarialExamples[Bubeck-Price- Razenshteyn’18]…
• AlternativeEvidenceofComputationalHardness?
Thanks!AnyQuestions?