pdf uncertainties the lhc made easy: a compression algorithm for the combination of pdf sets

19
PDF uncertainties at the LHC made easy: A compression algorithm for the combination of PDF sets Juan Rojo STFC Rutherford Fellow Rudolf Peierls Center for Theoretical Physics University of Oxford in collaboration with S. Carrazza, J. I. Latorre and G. Watt PDF4LHC Meeting CERN, 21/01/2015 Juan Rojo PDF4LHC Meeting, 21/01/2015

Upload: juanrojochacon

Post on 27-Jul-2015

43 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PDF uncertainties the LHC made easy: a compression algorithm for the combination of PDF sets

!

PDF uncertainties at the LHC made easy: A compression algorithm for the combination of PDF sets

Juan Rojo!STFC Rutherford Fellow!

Rudolf Peierls Center for Theoretical Physics!University of Oxford!

!in collaboration with S. Carrazza, J. I. Latorre and G. Watt!

! PDF4LHC Meeting!

CERN, 21/01/2015

Juan Rojo PDF4LHC Meeting, 21/01/2015

Page 2: PDF uncertainties the LHC made easy: a compression algorithm for the combination of PDF sets

2

Motivation

! To provide a practical implementation of the PDF4LHC recommendation, easy to use by the

experiments and computationally less intensive that the original prescription!

Based on the Monte Carlo statistical combination of different PDF sets, followed by a compression algorithm to end up with a reduced number of replicas!

Similar in spirit to the Meta-PDF approach (Gao and Nadolsky 14) but important conceptual and practical differences!

Having a single combined PDF sets (even with large number of eigenvector/sets) would already simplify the life of many people since widely-used tools like MadGraph5_aMC@NLO, POWHEG or FEWZ provide the PDF uncertainties without any additional cost!

But this is not true for all theory tools used at the LHC, so there is still a strong motivation to be able to use a combined PDF set with a small number of eigenvectors/replicas!

Here all results obtained for fixed alphas(MZ)=0.118, adding the combined PDF+alphas uncertainty in quadrature (updated PDF4LHC recommendation) trivial in our approach!

In addition, the compression algorithm can also be used in native MC sets, like NNPDF, starting from a 1000 replica sample and reducing it to a smaller sample while reproducing all statistical estimators

Juan Rojo PDF4LHC Meeting, 21/01/2015

Page 3: PDF uncertainties the LHC made easy: a compression algorithm for the combination of PDF sets

3

Basic strategy (I)!

Select the PDF sets that enter the combination. Results here based on NNPDF3.0, CT10 and MMHT14, but any other choice possible!

Transform the Hessian PDF sets into their Monte Carlo representation (Watt and Thorne 12)

! Now combine the same number of replicas from each of the three sets (assume equal weight in the

combination). First proposed by Forte 12 !

The resulting Monte Carlo ensemble has a robust statistical interpretation, and in many cases leads to similar results, with somewhat smaller uncertainties, compared to the original PDF4LHC envelope.

Forte and Watt 13Juan Rojo PDF4LHC Meeting, 21/01/2015

Page 4: PDF uncertainties the LHC made easy: a compression algorithm for the combination of PDF sets

4

The combined PDF set! Since there is reasonable agreement between CT10, MMHT14 and NNPDF3.0, the resulting

combined distribution is in general Gaussian, but there are also important cases where the non-gaussianity of the combined PDFs is substantial

x 5<10 4<10 3<10 2<10 1<10

1<

0

1

2

3

4

5

6

Q = 1.4142 GeV

NNPDF30_nnlo_as_0118

MMHT2014nnlo68cl_rand1002

CT10nnlo_rand1004

MCcompPDFnnlo

Q = 1.4142 GeV

x * PDF0.76 0.78 0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98

Prob

abilit

y pe

r bin

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Gluon PDF, x=0.1, Q=100 GeVhisto1

Entries 100Mean 0.8887RMS 0.0151

NNPDF3.0CT10MMHT14MCcompPDFs

Gluon PDF, x=0.1, Q=100 GeV

! It is possible to have smoother distributions by increasing the number of replicas for each set, but

this does not seem to be required by phenomenology!

For typical applications using Nrep=100 for each of the three PDF sets is enough!

Note that in general, the combination of Gaussian distributions is not a Gaussian itself

Juan Rojo PDF4LHC Meeting, 21/01/2015

Page 5: PDF uncertainties the LHC made easy: a compression algorithm for the combination of PDF sets

5

Combined MC set vs PDF4LHC envelope! As already noted in the Forte-Watt study, the MC combination leads to somewhat smaller uncertainties

than the PDF4LHC envelope (same here and in the Meta-PDFs)!

This can be understood because now each PDF set receives the same weight, while the PDF4LHC envelope effectively gives more weight to the outliers

Juan Rojo PDF4LHC pre-Meeting, 16/01/2014

(pb)

m

40

40.5

41

41.5

42

42.5

43

43.5

44

=0.118s_ggH, ggHiggs NNLO, LHC 13 TeV,

NNPDF3.0

MMHT14

CT10

CMCPDF

=0.118s_ggH, ggHiggs NNLO, LHC 13 TeV,

(pb)

m

750

760

770

780

790

800

810

820

830

840

850

=0.118s_ttbar, top++ NNLO, LHC 13 TeV,

NNPDF3.0

MMHT14

CT10

CMCPDF

=0.118s_ttbar, top++ NNLO, LHC 13 TeV,

(nb)

m

3300

3350

3400

3450

3500

3550

3600

3650

=0.118s_W+, VRAP NNLO, LHC 13 TeV,

NNPDF3.0

MMHT14

CT10

CMCPDF

=0.118s_W+, VRAP NNLO, LHC 13 TeV,

(nb)

m

460

470

480

490

500

510

=0.118s_Z0, VRAP NNLO, LHC 13 TeV,

NNPDF3.0

MMHT14

CT10

CMCPDF

=0.118s_Z0, VRAP NNLO, LHC 13 TeV,

PDF4LHC envelope

PDF4LHC envelope

PDF4LHC envelope

PDF4LHC envelope

Page 6: PDF uncertainties the LHC made easy: a compression algorithm for the combination of PDF sets

6

Compression as a mathematical problem! The goal now is to compressed the combined set of Nrep=300 replicas to a smaller subset, in a way

that this subset reproduces the statistical properties of the original distribution! Mathematically, this is a well-defined problem:

compression is finding the subset that minimises the distance between two probability distributions

! Many equally good minimisations possible,

so choice of minimisation algorithm not crucial (similar to the travelling salesman problem)!

Mathematically well-posed problem, with a number of robust solutions!

1. Kolmogorov distance!

2. Kullback-Leibler entropy!

3. …..!

Optimal choice determined by the requirements of the problem at hand, in this case LHC phenomenology

Juan Rojo PDF4LHC Meeting, 21/01/2015

Page 7: PDF uncertainties the LHC made easy: a compression algorithm for the combination of PDF sets

7

Basic strategy (II)!

Now we have a single combined set, but number of MC replicas still too large!

Compress the original probability distribution to one with a smaller number of replicas, in a way that all the relevant estimators (mean, variances, correlations etc) for the PDFs are reproduced

! The compression is applied at Q = 2 GeV,

though the results are robust wrt other choices!

Various options about how the error function to be minimised can be defined, ie., to reproduce central values add a term

! The algorithm also minimises the Kolmogorov distance

between the original and compressed distributions

! Same for variances, correlations and higher

moments!

At the end, optimal choice decided by the resulting phenomenology

Page 8: PDF uncertainties the LHC made easy: a compression algorithm for the combination of PDF sets

8

Results of the compression! To gauge improvements due to compression, compare various contributions to the error function in

the best compression and in randoms selection with the same number of replicas

! Substantial improvements as

compared to random compressions, typically by one order or magnitude or more!

Compression is also able to successfully reproduce higher moments like skewness or kurtosis!

Similar improvements for the correlations and the Kolmogorov distances

Horizontal dashed line: !

lower limit of 68%CL range for random compressions with Nrep=100

Juan Rojo PDF4LHC Meeting, 21/01/2015

Page 9: PDF uncertainties the LHC made easy: a compression algorithm for the combination of PDF sets

9

Results of the compression

Juan Rojo PDF4LHC Meeting, 03/11/2014

! For example, for Nrep=40 replicas the compressed and the original PDFs are virtually identical

x 5<10 4<10 3<10 2<10 1<10

Glu

on, r

atio

to p

rior

0.75

0.8

0.85

0.9

0.95

1

1.05

1.1

1.15

1.2

1.25

Q = 100 GeV

Prior, 300 MC replicas

Compressed set, 40 MC replicas

Q = 100 GeV

x 5<10 4<10 3<10 2<10 1<10

Up,

ratio

to p

rior

0.75

0.8

0.85

0.9

0.95

1

1.05

1.1

1.15

1.2

1.25

Q = 100 GeV

Prior, 300 MC replicas

Compressed set, 40 MC replicas

Q = 100 GeV

x 5<10 4<10 3<10 2<10 1<10

Up,

ratio

to p

rior

0.75

0.8

0.85

0.9

0.95

1

1.05

1.1

1.15

1.2

1.25

Q = 100 GeV

Prior, 300 MC replicas

Compressed set, 10 MC replicas

Q = 100 GeV

x 5<10 4<10 3<10 2<10 1<10

Glu

on, r

atio

to p

rior

0.75

0.8

0.85

0.9

0.95

1

1.05

1.1

1.15

1.2

1.25

Q = 100 GeV

Prior, 300 MC replicas

Compressed set, 10 MC replicas

Q = 100 GeV

! As expected, for a very small number of replicas (10 in this case) agreement is much worse

Page 10: PDF uncertainties the LHC made easy: a compression algorithm for the combination of PDF sets

10

The compressed PDF set! Since there is reasonable agreement between CT10, MMHT14 and NNPDF3.0, the resulting

combined distribution looks typically Gaussian

! On average, the same number of replicas from each of the three sets is selected in the compressed

set, a further demonstration that the algorithm is unbiased

Replicas0 50 100 150 200 250 300

Entri

es

0

1

CMC-PDF NLO - 25 replica distribution

NNPDF3.0 CT10 MMHT149 replicas 8 replicas 8 replicas

Juan Rojo PDF4LHC Meeting, 21/01/2015

Page 11: PDF uncertainties the LHC made easy: a compression algorithm for the combination of PDF sets

11

Gaussian vs non-Gaussian! Even if the original PDF sets in the combination are approximately Gaussian, their combination in general will

be non-Gaussian, and linear propagation might not be adequate!

Working in Gaussian approximation might not be reliable: i.e. skewness is not reproduced in the compression (despite central values and variances are) unless we explicitly include it in the minimised figure of merit

Juan Rojo PDF4LHC pre-Meeting, 16/01/2014

Skewness included

Skewness excluded

Page 12: PDF uncertainties the LHC made easy: a compression algorithm for the combination of PDF sets

12

Compressing native MC sets! The compression algorithm can of course be also used in native MC sets, like NNPDF. We have

shown that starting from NNPDF3.0 with Nrep=1000 replicas we can compress down to 40-50 replicas maintaining all relevant statistical properties

! Central values and variances well reproduced, but also, non-trivially, also higher moments and

correlations!

Sets with Nrep=1000 replicas are still useful for other applications, like Bayesian reweighting

All plots done with the APFEL Web plotter

Juan Rojo PDF4LHC Meeting, 21/01/2015

Page 13: PDF uncertainties the LHC made easy: a compression algorithm for the combination of PDF sets

13

Phenomenology ! The ultimate validation is of course to check that the compressed set reproduces the original PDF

combination for a wide variety of LHC observables!

We have tested a very large number of processes, both at the inclusive and differential level and always found that Nrep=20-30 replicas are enough for phenomenology

!NNLO cross-sections:!

gg->H with ggHiggs!

tt with top++!

W,Z with Vrap

Juan Rojo PDF4LHC Meeting, 21/01/2015

Page 14: PDF uncertainties the LHC made easy: a compression algorithm for the combination of PDF sets

14

Phenomenology ! The ultimate validation is of course to check that the compressed set reproduces the original

combination for a wide variety of observables!

We have tested a very large number of processes, both at the inclusive and differential level and always found that Nrep=20-30 replicas are enough for phenomenology

!NLO inclusive cross sections with MCFM!

H VBF!

WW!

WH

Juan Rojo PDF4LHC Meeting, 21/01/2015

Page 15: PDF uncertainties the LHC made easy: a compression algorithm for the combination of PDF sets

15

Phenomenology ! The ultimate validation is of course to check that the compressed set reproduces the original

combination for a wide variety of observables!

We have tested a very large number of processes, both at the inclusive and differential level and always found that Nrep=20-30 replicas are enough for phenomenology

Compression also works for fully differential distributions!

Tested on a large number of processes: jets, Drell-Yan, WW, W+charm, Z+jets, ….!

Calculations use fast NLO interfaces:!

1. aMCfast/applgrid for MadGraph5_aMC@NLO!

2. applgrid for MCFM/NLOjet++!

Very flexible to redo validation for any other compressed set

Juan Rojo PDF4LHC Meeting, 21/01/2015

Page 16: PDF uncertainties the LHC made easy: a compression algorithm for the combination of PDF sets

16

Correlations ! The compression algorithm also manages to reproduce the correlations between physical

observables, even for a small number of replicas

Cor

rela

tion

coef

ficie

nt

1<

0.5<

0

0.5

1

Correlation Coefficient for ttbar

ggH tt W+ W- Z

= 40repN

Reference

Compressed

Correlation Coefficient for ttbar

! Direct consequence of the fact that correlations between PDFs are reproduced

Juan Rojo PDF4LHC Meeting, 21/01/2015

Page 17: PDF uncertainties the LHC made easy: a compression algorithm for the combination of PDF sets

17

Correlations !

Not an accident: selecting replicas at random fails to reproduce the correlation accurately enoughC

orre

latio

n co

effic

ient

1<

0.5<

0

0.5

1

Correlation Coefficient for ttbar

= 40repN

Reference

Compressed

Random (68% CL)

Correlation Coefficient for ttbar

Juan Rojo PDF4LHC Meeting, 21/01/2015

Page 18: PDF uncertainties the LHC made easy: a compression algorithm for the combination of PDF sets

18

Comparison with Meta-PDFs!

To compare with the available Meta-PDFs in LHAPDF6, we have produced compressed sets based on MSTW08, CT10 and NNPDF2.3!

Reasonable agreement found for central values and variances, except perhaps small- and at large-x!

Need to redo the comparison when the two approaches use NNPDF3.0, MMHT14 and CT14

Juan Rojo PDF4LHC Meeting, 21/01/2015

Page 19: PDF uncertainties the LHC made easy: a compression algorithm for the combination of PDF sets

19

Summary and discussion! Our investigations suggest that the Compressed MC PDFs (CMC-PDFs) provide an efficient and easy to use

implementation of the PDF4LHC combination!

A single set with only 25 replicas seems to be enough to reproduce central values and variances of all the LHC processes, inclusive and differential, that we have explored. Correlations both between PDFs and cross-sections are also well reproduced!

Our plan is to release publicly the combination and compression codes, so that users can create their own compressed sets in LHAPDF6 format with the only input the names of the sets to be used!

Some of the advantages of our approach wrt the MetaPDFs are!

1. !Improvement in CPU time (baseline Meta-PDF has 100 eigenvector members)!

2. Streamlined construction of the combined sets with no need of refitting (and thus of testing the accuracy of the refit, which strongly depends on the PDF sets used) nor to redo the PDF evolution and reconstruct PDF interpolation grids: for CMC-PDFs only the LHAPDF6 files needed as input!

3. No need of Gaussian/linear approximations: the compression algorithm reproduces the full probability distribution of the combined set (which in general is non-Gaussian)!

4. No strong need for process-dependent PDF combined sets (like a Higgs-specific Meta-PDFs)!

However we believe that the two approaches can nicely complement each other: The agreement between the compressed PDFs and the Meta-PDFs, for the same input tests, provide a non-trivial validation of the combination. It is conceivable that the two of them could be used in LHC analysis.!

To make further progress, we need that:!

1. !PDF4LHC decides which PDF sets (and how) should be used in the combination!

2. This input is used to construct Meta-PDFs and CMC-PDFs!

3. Agree on benchmark tests for LHC processes and test the performance of both methods!