the flavor structure of the nucleon sea institute for nuclear … · 2017. 10. 3. · the flavor...

New approaches to global PDF analysis

Wally Melnitchouk

with Nobuo Sato

Connecticut / JLab

The Flavor Structure of the Nucleon SeaInstitute for Nuclear Theory

October 2, 2017

Outline

Bayesian approach to fitting

Motivation — why the need for a new paradigm?

single-fit (Hessian) vs. Monte Carlo approaches

Monte Carlo methods

2

shortcomings of Hessian (Gaussian) approach

Generalization to non-Gaussian likelihoods

“tolerance” factors (uncertainties should not depend on # of parameters!)

Incompatible data sets

iterative MC, nested sampling, …

disjoint probabilities, empirical Bayes, …

Outlook

Motivation

3

With limited number of observables and finite statistics,need a robust analysis framework to extract meaningfulparton information from experiment

Over the first ~ 2-3 decades of global PDF analysis efforts, minimization (single-fit) analysis (with Hessian error propagation) has generally been sufficient to map out global characteristicsof partonic structure

�2

e.g. shapes of quark PDFs from DIS, where data are plentiful

A major challenge has been to characterize PDF uncertainties— in a statistically meaningful way — in the presence oftensions among data sets

Motivation

4

utilize modern techniques based on Bayesian statistics!

Previous attempts sought to address tensions in data setsby introducing

“neural net” parametrization (instead of polynomial parametrization), together with MC techniques

“tolerance” factors (artificially inflating PDF errors)

However, to address the problem in a more statistically rigorous way, one requires going beyond the standard minimization paradigm�2

MotivationIn the near future, standard minimization techniqueswill be unsuitable — even in the absence of tensions —e.g. for

simultaneous analysis of collinear distributions(unpolarized & polarized PDFs, fragmentation functions)

�2

“JAM17”: Jake Ethier (Tuesday)

new types of observables — TMDs or GPDs —that will involve data points, with parameters

> O(105) O(103)

Motivation

Need more reliable algorithms — “PDFs beyond the LHC”!

6

Typically PDF parametrizations are nonlinear functions ofthe PDF parameters, e.g.

Robust parameter estimation that thoroughly scans over arealistic parameter space, including multiple local minima,is only possible using MC methods!

have multiple local minima present in the function�2

xf(x, µ) = Nx

↵(1� x)� P (x)

where P is a polynomial e.g. ,or Chebyshev, neural net, …

P (x) = 1 + �px+ ⇥ x

Bayesian approachto fitting

7

Analysis of data requires estimating expectation values Eand variances V of “observables” (= PDFs, FFs) which arefunctions of parameters

O

E[O] =

ZdnaP(~a|data)O(~a)

V [O] =

ZdnaP(~a|data) ⇥O(~a)� E[O]

⇤2

8


“Bayesian master formulas"

Using Bayes’ theorem, probability distribution given byP

P(~a|data) = 1

ZL(data|~a)⇡(~a)

in terms of the likelihood function L

~a

�2(~a) =

X

i

✓data i � theoryi(~a)

�(data)

◆2

9


Likelihood function

L(data|~a) = exp

✓�1

2

�2(~a)

◆

is a Gaussian form in the data, with function�2

with priors and “evidence”⇡(~a) Z

Z =

ZdnaL(data|~a)⇡(~a)

Z tests if e.g. an n-parameter fit is statistically differentfrom (n+1)-parameter fit

10


Two methods generally used for computing Bayesianmaster formulas:

Maximum Likelihood( minimization)

Monte Carlo�2

11



Maximum Likelihood( minimization)�2

maximize probability distribution by minimizingfor a set of best-fit parameters

P �2

~a0

E [~a ] = ~a0

12




maximize probability distribution by minimizingfor a set of best-fit parameters

P �2

~a0

if is linear in the parameters, and if probability issymmetric in all parametersO ⇡

E [~a ] = ~a0

E [O(~a) ] ⇡ O(~a0)

13




variance computed by expanding about e.g. in 1 dimension have “master formula”

~a0O(~a)

V [O] ⇡ 1

4

hO(a+ �a)�O(a� �a)

i2

where

�a2 = V [a]

14




Hij =1

2

@2�2(~a)

@ai@aj

��~a=~a0

find set of (orthogonal) contours in parameter space around such that along each contour is parametrized by statistically independent parameters — directions of contours given by eigenvectors of Hessian matrix H, with elements

~a0L

ek

generalization to multiple dimensions via Hessian approach:

and contours parametrized as , with eigenvectors of H

�a(k) = a(k) � a0 = tkekpvkvk

15




basic assumption: factorizes along each eigendirection

note: in quadratic approximation for , this becomes a normal distribution

P

P(�a) ⇡Y

k

Pk(tk)

where

Pk(tk) = Nk exp

h� 1

2

�2⇣a0 + tk

ekpvk

⌘i

�2

16




uncertainties on along each eigendirection(assuming linear approximation)

O

(�Ok)2 ⇡ 1

4

O⇣a0 + Tk

ekpvk

⌘�O

⇣a0 � Tk

ekpvk

⌘�2

V [O] =X

k

(�Ok)2

where is finite step size in , with total varianceTk tk

17



Monte Carlo

in practice, generally one hasso the maximal likelihood method will sometimes fail

E[O(~a)] 6= O(E[~a])

Monte Carlo approach samples parameter space andassigns weights to each set of parameterswk ak

expectation value and variance are then weighted averages

,E[O(~a)] =X

k

wk O(~ak) V [O(~a)] =X

k

wk

�O(~ak)� E[O]�2

18



Maximum Likelihood( minimization)

Monte Carlo�2

fast

accurate

does not rely onGaussian assumptions

includes all possible solutions

assumes Gaussianity

no guarantee that globalminimum has been found

errors only characterizelocal geometry of function�2

slow

Incompatibledata sets

19

20


Incompatible data sets can arise because of errorsin determining central values, or underestimationof systematic experimental uncertainties

requires some sort of modification to standard statistics

Often one modifies the master formula by introducinga “tolerance” factor T

V [O] =T 2

4

hO(a+ �a)�O(a� �a)

i2e.g. for one dimension

effectively modifies the likelihood function

V [O] ! T 2 V [O]

21


Simple example: consider observable , and two measurements(m1, �m1), (m2, �m2)

m

�2 =

✓m�m1

�m1

◆2

+

✓m�m2

�m2

◆2

E[m] =m1�m2

2 +m2�m21

�m21 + �m2

2

V [m] = H�1 =�m2

1 �m22

�m21 + �m2

2

compute exactly the function�2

and, from Bayesian master formula, the mean value

and variance does notdepend onm1�m2 !

22

�4 �2 0 2 4m

0.0

0.5

1.0

1.5

L(m

|m1,

m2)

�4 �2 0 2 4m

0.0

0.5

1.0

1.5

total uncertainty remains independent of degree of (in)compatibility of data


Simple example: consider observable , and two measurements(m1, �m1), (m2, �m2)

m

Gaussian likelihood gives unrealistic representation of true uncertainty

same different m2

�m2

23

Realistic example: recent CJ (CTEQ-JLab) global PDF analysis

24 parameters,33 data sets

�100 �50 0 50 1000.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

Lik

elih

ood

1

23

4

5

67

8

9

10

11

12

13

141516

1718

1920 212223

2425 26

27

28

29

3031

3233

34

�100 �50 0 50 1000

2000

4000

6000

8000

10000

��

2

1

23

4

5

67

8

91011

12

13141516

1718

1920212223242526

27

28

29

3031323334

�10 �5 0 5 100.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

Lik

elih

ood

1

23

5 812

17 18 27

28

29

3031

34

�10 �5 0 5 100

20

40

60

80

100

��

2

(0) TOTAL

(1) HerF2pCut

(2) slac p

(3) d0Lasy13

(4) e866pd06xf

(5) BNS F2nd

(6) NmcRatCor

(7) slac d

(8) D0 Z

(9) H2 NC ep 3

(10) H2 NC ep 2

(11) H2 NC ep 1

(12) H2 NC ep 4

(13) CDF Wasy

(14) H2 CC ep

(15) cdfLasy05

(16) NmcF2pCor

(17) e866pp06xf

(18) H2 CC em

(19) d0run2cone

(20) d0 gamjet1

(21) CDFrun2jet

(22) d0 gamjet3

(23) d0 gamjet2

(24) d0 gamjet4

(25) jl00106F2d

(26) HerF2dCut

(27) BcdF2dCor

(28) CDF Z

(29) D0 Wasy

(30) H2 NC em

(31) jl00106F2p

(32) d0Lasy e15

(33) BcdF2pCor�1.0

�0.8

�0.6

�0.4

�0.2

0.0

0.2

0.4

0.6

Pro

ject

ion

0

1

23

4

5 6

7

8

9

10 11 12 13 14

15

16 17

18

19

20

21 22 23

(0) a1uv(1) a2uv(2) a4uv(3) a1dv(4) a2dv(5) a3dv(6) a4dv(7) a0ud(8) a1ud(9) a2ud(10) a4ud(11) a1du

(12) a2du(13) a4du(14) a1g(15) a2g(16) a3g(17) a4g(18) a6dv(19) off1(20) off2(21) ht1(22) ht2(23) ht3

data setscompatible along thise-direction

eigen-direction # 3


24


data sets notcompatible along thise-direction


�100 �50 0 50 1000.0

0.1

0.2

0.3

0.4

0.5

0.6

Lik

elih

ood

1

2

3

4

5

6

7

8

910

11

12

13

14

1516

1718

19

20

21

22

2324

25

26

2728

29

3031

32

33

34

�100 �50 0 50 1000

2000

4000

6000

8000

10000

��

2

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

1920

2122

232425

26

27

28

29

30

31

32

33

34

�10 �5 0 5 100.0

0.1

0.2

0.3

0.4

0.5

0.6

Lik

elih

ood

1

2

3

4

5

6

7

8

910

11

12

13

14

16

1718

19

2022

23

26

2728

29

3031

32

33

34

�10 �5 0 5 100

20

40

60

80

100

��

2

(0) TOTAL

(1) HerF2pCut

(2) slac p

(3) d0Lasy13

(4) e866pd06xf

(5) BNS F2nd

(6) NmcRatCor

(7) slac d

(8) D0 Z

(9) H2 NC ep 3

(10) H2 NC ep 2

(11) H2 NC ep 1

(12) H2 NC ep 4

(13) CDF Wasy

(14) H2 CC ep

(15) cdfLasy05

(16) NmcF2pCor

(17) e866pp06xf

(18) H2 CC em

(19) d0run2cone

(20) d0 gamjet1

(21) CDFrun2jet

(22) d0 gamjet3

(23) d0 gamjet2

(24) d0 gamjet4

(25) jl00106F2d

(26) HerF2dCut

(27) BcdF2dCor

(28) CDF Z

(29) D0 Wasy

(30) H2 NC em

(31) jl00106F2p

(32) d0Lasy e15


�0.6

�0.4

�0.2

0.0

0.2

0.4

0.6

0.8

Pro

ject

ion

0

1

2

3

4

5

6

7 89

10

1112

13 14

15

16

17

18

19

20

21

22

23



eigen-direction #18


25



�100 �50 0 50 1000.0

0.1

0.2

0.3

0.4

0.5

0.6

Lik

elih

ood

1

2

3

4

5

6

7

8

910

11

12

13

14

1516

1718

19

20

21

22

2324

25

26

2728

29

3031

32

33

34

�100 �50 0 50 1000

2000

4000

6000

8000

10000

��

2

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

1920

2122

232425

26

27

28

29

30

31

32

33

34

�10 �5 0 5 100.0

0.1

0.2

0.3

0.4

0.5

0.6

Lik

elih

ood

1

2

3

4

5

6

7

8

910

11

12

13

14

16

1718

19

2022

23

26

2728

29

3031

32

33

34

�10 �5 0 5 100

20

40

60

80

100

��

2

(0) TOTAL

(1) HerF2pCut

(2) slac p

(3) d0Lasy13

(4) e866pd06xf

(5) BNS F2nd

(6) NmcRatCor

(7) slac d

(8) D0 Z

(9) H2 NC ep 3

(10) H2 NC ep 2

(11) H2 NC ep 1

(12) H2 NC ep 4

(13) CDF Wasy

(14) H2 CC ep

(15) cdfLasy05

(16) NmcF2pCor

(17) e866pp06xf

(18) H2 CC em

(19) d0run2cone

(20) d0 gamjet1

(21) CDFrun2jet

(22) d0 gamjet3

(23) d0 gamjet2

(24) d0 gamjet4

(25) jl00106F2d

(26) HerF2dCut

(27) BcdF2dCor

(28) CDF Z

(29) D0 Wasy

(30) H2 NC em

(31) jl00106F2p

(32) d0Lasy e15


�0.6

�0.4

�0.2

0.0

0.2

0.4

0.6

0.8

Pro

ject

ion

0

1

2

3

4

5

6

7 89

10

1112

13 14

15

16

17

18

19

20

21

22

23



eigen-direction #18


standard Gaussian likelihood incapable of accounting for underestimated individual errors (leading to incompatible data sets)— not designed for such scenarios!

data sets notcompatible along thise-direction

26

SLAC-PUB-16238

Comment on “New Limits on Intrinsic Charm in the Nucleon from Global Analysis ofParton Distributions”

Stanley J. Brodsky1 and Susan Gardner2

1SLAC National Accelerator Laboratory, Stanford University, Stanford, CA 943092Department of Physics and Astronomy, University of Kentucky, Lexington, KY 40506-0055

A Comment on the Letter by P. Jimenez-Delgado, T. J. Hobbs, J. T. Londergan, and W. Mel-nitchouk, Phys. Rev. Lett. 114, 082002 (2015).

Intrinsic heavy quarks in hadrons emerge from the non-perturbative structure of a hadron bound state [1] and are arigorous prediction of QCD [2, 3]. Lattice QCD calculations also indicate a significant intrinsic charm probability [4, 5].Since the light-front momentum distribution of the Fock states is maximal at equal rapidity, intrinsic heavy quarkscarry significant fractions of the momentum. The presence of Fock states with intrinsic strange, charm, or bottomquarks in hadrons lead to an array of novel physics phenomena [6]. Accurate determinations of the heavy-quarkdistribution functions in the proton are needed to interpret LHC measurements as probes of physics beyond theStandard Model [7, 8]. Determinations [7, 9, 10] of the momentum fraction carried by intrinsic charm quarks in theproton typically limit ⟨x⟩IC ∼ O(1%) at 90% CL, consistent with the analysis of the EMC measurements of the charmstructure function [11] and the large rate for high-pT pp → cγX reactions at the Tevatron [12]; however, a precisedetermination of ⟨x⟩IC has proved elusive. The letter by P. Jimenez-Delgado, T. J. Hobbs, J. T. Londergan, andW. Melnitchouk (JDHLM) [13] is the most recent of such analyses, and it finds a much more severe limit on intrinsiccharm ⟨x⟩IC ∼ O(0.1%) than the previous such study [7]. JDHLM input different shapes for the intrinsic charmcontributions but allow the overall normalization to vary. They include low-energy data from the 1991 single-armed (p) → e′X SLAC experiment [14] in their global fit. Ref. [7] did not use the SLAC data and came to much weakerconclusions. Nevertheless, we believe the very stringent conclusions of JDHLM are in error.JDHLM assess their PDF errors using a tolerance criteria of ∆χ2 = 1 at 1σ; however, the actual value of ∆χ2 to

be employed depends on the number of parameters to be simultaneously determined in the fit. This is illustrated inTable 38.2 of Ref. [15] and is used broadly, noting, e.g., Refs. [16–19]. Ref. [7] employs the CT10 PDF analysis [20],so that it contains 25 parameters, plus one for intrinsic charm. Figure 38.2 of Ref. [15] then shows that ∆χ2 ≈ 29 at1σ (68% CL), whereas ∆χ2 ≈ 36 at 90% CL. Ref. [7] uses the criterion ∆χ2 > 100, determined on empirical grounds,to indicate a poor fit. JDHLM employs the framework of Ref. [21] which contains 25 parameters for the PDFs and12 for the higher-twist contributions, so that a much larger tolerance than ∆χ2 = 1 is warranted.JDHLM find that the SLAC data (on d and p targets) give the strongest constraints on intrinsic charm, although,

by their count, only 157 of 1021 data points have W 2 in excess of the charm hadronic threshold: W 2th

≈ 16GeV2.[JDHLM mention the partonic threshold constraint W 2 > 4m2

c, but this is not relevant for the detection of intrinsiccharm — if x < 1, leptons can only scatter off charm quarks when the kinematics permit the formation of charmedhadrons in the final state.] It is possible that JDHLM’s strong rejection of the intrinsic charm hypothesis is drivenby sharpened constraints on the non-charm PDFs. However, for the SLAC data set, the theoretical model which isconstrained is that of the intrinsic charm PDF combined with the treatment of uncertain higher-twist and thresholdcorrections. Thus a global analysis cannot reject intrinsic charm per se, but rather only the particular model in whichit is embedded.We also note that JDHLM exclude the EMC data — which indicate significant intrinsic charm — citing a “goodness

of fit” criterion. Statistical criteria alone cannot allow the exclusion of data sets, as here with the EMC data; additionalcorrections, however, may exist through their use of an iron target [22, 23].Finally, we note that the SLAC measurements of ed (p) → e′X , which only detects the scattered electron, has an

overall normalization (systematic) error of ± 1.7 (2.1)%, and a relative normalization error of typically ±1.1% [14].The SLAC data points in the W 2 > 16 GeV2 and x > 0.1 regime where intrinsic charm could be directly relevanthave even larger statistical uncertainties. Thus it seems implausible that the SLAC data can yield the severe contraintclaimed.JDHLM claim that the momentum fraction carried by intrinsic charm is ⟨x⟩IC < 0.1% at the 5σ level, and they

note in their final summary that ⟨x⟩IC ≤ 0.5% at 4σ. We find neither conclusion is warranted.We thank B. Plaster for a cross-check of Fig. 38.2 in Ref. [15] and B. Plaster, A. Deur, P. Hoyer, C. Lorce,

J. Pumplin, and R. Vogt for helpful remarks. We acknowledge support from the U.S. Department of Energy undercontracts DE–AC02–76SF00515 and DE–FG02–96ER40989.

CTEQ “tolerance criteria” (variations adopted by other groups, e.g., MMHT, CJ)

scaling of with number of parameters (or number of degrees of freedom)

Two ways in which tolerance factors usually implemented


Pumplin, Stump, Huston, Lai, Nadolsky, TungJHEP 07 (2002) 012

��2

e.g. Brodsky, GardnerPRL (Comment) 116, 019101 (2016)

27


CTEQ tolerance criteria

for each experiment, find minimum along given e-direction�2

from distribution determine 90% CL for each experiment �2

along each side of e-direction, determine maximum rangeallowed by the most constraining experiment

d±k

!30

!20

!10

0

10

20

30

40

distance

Eigenvector 4

BCDMSp

BCDMSd

H1a

H1b

ZEUS

NMCp

NMCr

CCFR2

CCFR3

E605

CDFw

E866

D0jet

CDFjet

CTEQ6eigen-direction #4

p��2

computed by averaging over all (typically )d±kT T ⇠ 5� 10

T (⇡ 10)

28


CTEQ tolerance criteria

!30

!20

!10

0

10

20

30

40

distance

Eigenvector 4

BCDMSp

BCDMSd

H1a

H1b

ZEUS

NMCp

NMCr

CCFR2

CCFR3

E605

CDFw

E866

D0jet

CDFjet

CTEQ6eigen-direction #4

p��2

T (⇡ 10)

This approach is not consistent with Gaussian likelihood

no clear Bayesian interpretation of uncertainties(ultimately, a prescription…)

29

Scaling of with # of parameters: “ paradox”��2 ��2


Simple example: two parameters with mean values and standard deviation

✓i (i = 1, 2)µi �i

joint probability distribution

P(✓1, ✓2) =Y

i=1,2

1p2⇡�2

i

exp

"�1

2

✓✓i � µi

�i

◆2#

change variables and usepolar coordinates r2 = t21 + t22, � = tan�1(t2/t1)

✓i ! ti = (✓i � µi)/�i

d✓1d✓2 P(✓1, ✓2) =d�

2⇡rdr exp

�1

2

r2�

30



confidence volume

= 68% for R = 2.279

CV ⌘Z

d✓1d✓2 P(✓1, ✓2) =

Z R

0dr r exp

�1

2

r2�

note that , so that confidenceregion for parameters

R2 = t21 + t22 ⌘ �2

max [ti] = R

implies that , which contradicts✓i = µi ± �i R

original premise that !✓i = µi ± �i

t2

t1

R

31



to resolve paradox, use Bayesian master formulas

E[✓i] =

Z 2⇡

0

d�

2⇡

Z 1

0drP(r,�) ✓i

=

Z 2⇡

0

d�

2⇡

Z 1

0dr r e�r2/2 (µi + ti �i) = µi X

32



to resolve paradox, use Bayesian master formulas

E[✓i] =

Z 2⇡

0

d�

2⇡

Z 1

0drP(r,�) ✓i

=

Z 2⇡

0

d�

2⇡

Z 1

0dr r e�r2/2 (µi + ti �i) = µi

V [✓i] =

Z 2⇡

0

d�

2⇡

Z 1

0drP(r,�) (✓i � µi)

2

=

Z 2⇡

0

d�

2⇡

Z 1

0dr r e�r2/2 (ti �i)

2 = �2i

X

X

33



no paradox if use for any number of parameters to characterize the CL

��2 = 1

1�

only consistent tolerance for Gaussian likelihood is T = 1

34

To summarize standard maximum likelihood method…

for ~30 parameters trying different starting points isimpractical, if do not have some information about shape

Gradient search (in parameter space) depends how “good” thestarting point is

Cannot guarantee solution is unique

Introduction of tolerance modifies Gaussian statistics

Common to free parameters initially, then freeze thosenot sensitive to data ( flat locally)�2

introduces bias, does not guarantee that flat globally�2

Error propagation characterized by quadratic near minimum�2

no guarantee this is quadratic globally (e.g. Student t-distribution?)

Monte Carlo methods

35

36

Monte Carlo

Designed to faithfully compute Bayesian master formulas

Do not assume a single minimum, include all possible solutions(with appropriate weightings)

Do not assume likelihood is Gaussian in parameters

Allows likelihood analysis to be extended to address tensionsamong data sets via Bayesian inference

More computationally demanding compared with Hessian method

37

Monte Carlo

First group to use MC for global PDF analysis was NNPDF,using neural network to parametrize in P (x)

f(x) = N x

↵(1� x)� P (x)

Iterative Monte Carlo (IMC), developed by JAM Collaboration,variant of NNPDF, tailored to non-neutral net parametrizations

J. Ethier

Markov Chain MC (MCMC) / Hybid MC (HMC)— recent “proof of principle” analysis, ideas from lattice QCD

Gbedo, Mangin-Brinet,PRD 96, 014015 (2017)

Nested sampling (NS) — computes integrals in Bayesian masterformulas (for E, V, Z) explicitly Skilling (2004)

— are fitted “preprocessing coefficients”↵,�

no assumptions for exponents

Use traditional functional form for input distribution shape,but sample significantly larger parameter space than possiblein single-fit analyses

cross-validation to avoidoverfitting

iterate until convergencecriteria satisfied

38

sampler

priors

fit

fit

fit

posteriors

original data

pseudo

data

training

data

fit

parameters from

minimization steps

validation

data

validation

posterior

as initial

guess

prior

Iterative Monte Carlo (IMC)


0.2 0.4 0.6 0.8 z

0.2

0.6

1.0

1.4

init

ial

u+

0.2 0.4 0.6 0.8 z

0.2

0.6

1.0

1.4

d+

0.2 0.4 0.6 0.8 z

0.2

0.6

1.0

1.4

s+

0.2 0.4 0.6 0.8 z

0.2

0.6

1.0

1.4

c+

0.2 0.4 0.6 0.8 z

0.2

0.6

1.0

1.4

b+

0.2 0.4 0.6 0.8 z

0.2

0.6

1.0

1.4

g

0.2 0.4 0.6 0.8 z

0.2

0.6

1.0

1.4

inte

rmed

iate ⇡+

K+

0.2 0.4 0.6 0.8 z

0.2

0.6

1.0

1.4

0.2 0.4 0.6 0.8 z

0.2

0.6

1.0

1.4

0.2 0.4 0.6 0.8 z

0.2

0.6

1.0

1.4

0.2 0.4 0.6 0.8 z

0.2

0.6

1.0

1.4

0.2 0.4 0.6 0.8 z

0.2

0.6

1.0

1.4

⇡+

K+

0.2 0.4 0.6 0.8 z

0.2

0.6

1.0

1.4

inte

rmed

iate

0.2 0.4 0.6 0.8 z

0.2

0.6

1.0

1.4

0.2 0.4 0.6 0.8 z

0.2

0.6

1.0

1.4

0.2 0.4 0.6 0.8 z

0.2

0.6

1.0

1.4

0.2 0.4 0.6 0.8 z

0.2

0.6

1.0

1.4

0.2 0.4 0.6 0.8 z

0.2

0.6

1.0

1.4

0.2 0.4 0.6 0.8 z

0.2

0.6

1.0

1.4

final

0.2 0.4 0.6 0.8 z

0.2

0.6

1.0

1.4

0.2 0.4 0.6 0.8 z

0.2

0.6

1.0

1.4

0.2 0.4 0.6 0.8 z

0.2

0.6

1.0

1.4

0.2 0.4 0.6 0.8 z

0.2

0.6

1.0

1.4

0.2 0.4 0.6 0.8 z

0.2

0.6

1.0

1.4

e.g. of convergence (for fragmentation functions) in IMC

39


~ 20iterations

Sato et al.PRD 94, 114004(2016)

40

Nested Sampling

Basic idea: transform n-dimensional integral to 1-D integral

Z =

ZdnaL(data|~a)⇡(~a) =

Z 1

0dX L(X)

where prior volume dX = ⇡(~a) dna

such that 0 < · · · < X2 < X1 < X0 = 1Feroz et al.arXiv:1306.2144 [astro-ph]

41

Nested Sampling

Approximate evidence by a weighted sum

Z ⇡X

i

Li wi wi =1

2(Xi�1 �Xi+1)with weights

Algorithm:

repeat until entire prior volume has been traversed

can be parallelized

randomly select samples from full prior s.t. initial volume X0 = 1

for each iteration, remove point with lowest , replacing itwith point from prior with constraint that its

Li

L > Li

increasingly used in fields outside of (nuclear) analysis

performs better than VEGAS for large dimensions

42

Nested Sampling

0.0 0.5 1.0 1.5 2.0 2.5

hu(1)1

0

2

4

6

norm

alize

dyie

ld

SIDIS

SIDIS + Lattice

�4 �3 �2 �1 0

hd(1)1

0

2

4

norm

alize

dyie

ld

SIDIS

SIDIS + Lattice

0 1 2 3 4gT

0.0

2.5

5.0

7.5

norm

alize

dyie

ld

SIDIS

SIDIS + Lattice

Recent applicationin global analysis of transversity TMD PDF

H.-W. Lin�u

�d

gT = �u� �dLin, WM, Prokudin,Sato, Shows (2017)

43

Assuming a single minimum, a Hessian or MC analysis must givesame results, if using same likelihood function

MC Error Analysis

analysis of pseudodata, generated using Gaussian distribution

0.0 0.2 0.4 0.6 0.8 1.0

x

0.0

0.5

1.0

1.5

2.0

2.5

Ng(

x;b

)

HESS

IMC

NS

HMC

0.0 0.2 0.4 0.6 0.8 1.0

x

0.96

0.98

1.00

1.02

1.04

Rat

ioto

hess

0.0 0.2 0.4 0.6 0.8 1.0

x

0.96

0.98

1.00

1.02

1.04

Rat

ioto

hess

0.0 0.2 0.4 0.6 0.8 1.0

x

0.96

0.98

1.00

1.02

1.04R

atio

tohe

ss

IMC

Nested HMC

44


MC Error Analysis

also for discrepant data

almost identical uncertainty bands for Hessian and for MC!

0.0 0.2 0.4 0.6 0.8 1.0

x

0.0

0.5

1.0

1.5

2.0

2.5

Ng(

x;b

)

HESS

IMC

HMC

NS

0.0 0.2 0.4 0.6 0.8 1.0

x

0.96

0.98

1.00

1.02

1.04

Rat

ioto

hess

0.0 0.2 0.4 0.6 0.8 1.0

x

0.96

0.98

1.00

1.02

1.04

Rat

ioto

hess

0.0 0.2 0.4 0.6 0.8 1.0

x

0.96

0.98

1.00

1.02

1.04

Rat

ioto

hess

0.0 0.2 0.4 0.6 0.8 1.0

x

0.0

0.5

1.0

1.5

2.0

2.5

Ng(

x;b

)

HESS

IMC

HMC

NS

0.0 0.2 0.4 0.6 0.8 1.0

x

0.96

0.98

1.00

1.02

1.04

Rat

ioto

hess

0.0 0.2 0.4 0.6 0.8 1.0

x

0.96

0.98

1.00

1.02

1.04

Rat

ioto

hess

0.0 0.2 0.4 0.6 0.8 1.0

x

0.96

0.98

1.00

1.02

1.04R

atio

tohe

ss

Nested

45


MC Error Analysis

Approaches that use Hessian + tolerance factor not consistentwith Gaussian likelihood function

Assuming sufficient observables to determine PDFs, thenPDF uncertainties cannot depend on parametrization!

NNPDF group claim that within their neural net MC methodology, no need for a tolerance factor, since uncertainties similar toother groups who use Hessian + tolerance

how can this be?

Non-Gaussian likelihood

46

Rigorous (Bayesian) way to address incompatible data setsis to use generalization of Gaussian likelihood

47


joint vs. disjoint distributions

empirical Bayes

hierarchical Bayes

others, used in different fields

48

Disjoint distributions

Instead of using total likelihood that is a product (“and”)of individual likelihoods, e.g. for simple example of two measurements

L(m1m2|m; �m1�m2) = L(m1|m; �m1)⇥ L(m2|m; �m2)

use instead sum (“or”) of individual likelihoods

L(m1m2|m; �m1�m2) =1

2

hL(m1|m; �m1) + L(m2|m; �m2)

i

gives rather different expectation value and variance

E[m] =1

2(m1 +m2)

V [m] =1

2(�m2

1 + �m22) +

✓m1 �m2

2

◆2

depends onseparation!

49


�4 �2 0 2 4m

0.0

0.5

1.0

1.5

L(m

|m1,

m2)

�4 �2 0 2 4m

0.0

0.5

1.0

1.5

L(m

1m

2|m

) joint

disjoint

joint:

disjoint: V [m] =1

2(�m2

1 + �m22) +

✓m1 �m2

2

◆2

V [m] =�m2

1 �m22

�m21 + �m2

2

Symmetric uncertainties �m1 = �m2

50


�4 �2 0 2 4m

0

1

2

3

L(m

|m1,

m2)

�4 �2 0 2 4m

0

1

2

3

L(m

1m

2|m

)Asymmetric uncertainties �m1 6= �m2

disjoint likelihood gives broader overall uncertainty,overlapping individual (discrepant) data

51

Shortcoming of conventional Bayesian — still assumeprior distribution follows specific form (e.g. Gaussian)

Empirical Bayes

Extend approach to more fully represent prior uncertainties,with final uncertainties that do not depend on initial choices

In generalized approach, data uncertainties modified bydistortion parameters, whose probability distributions givenin terms of “hyperparameters” (or “nuisance parameters”)

Hyperparameters determined from datagive posteriors for both PDF and hyperparameters

52

Empirical Bayes

Standard mean and variance that characterize data✓ = µ+ � f(µ) + g(�)

where are unknown functions that account forfaulty measurements

f(µ), g(�)

Simple choice is

(µ,�) ! (⇣1 µ+ ⇣2, ⇣3 �)

where are distortion parameters, with prob. dists.described by hyperparameters

⇣1,2,3�1,2,3

Likelihood function is then

L(data|~a, ⇣1,2,3) ⇠ exp

"�1

2

X

i

✓d1 � f(µi(~a, ⇣1,2))

g(�, ⇣3)

◆2#⇡1(⇣1|�1)⇡2(⇣1|�2)⇡3(⇣1|�3)

53

Empirical Bayes

�1.0 �0.5 0.0 0.5 1.00

2

4

6

8

10

P(t

|D) 1–�

�0.5 �0.4 �0.3 �0.2 �0.1 0.0 0.10

5

10

15

20

25

experiment 1

experiment 2

fit

EB fit

�1.0 �0.5 0.0 0.5 1.00

2

4

6

8

10

P(t

|D) 6–�

�0.5 �0.4 �0.3 �0.2 �0.1 0.0 0.10

5

10

15

20

25

�1.0 �0.5 0.0 0.5 1.0

t

0

2

4

6

8

10

P(t

|D) 12–�

�0.5 �0.4 �0.3 �0.2 �0.1 0.0 0.1

t

0

5

10

15

20

25

HessianEmpirical Bayes

Simple example of EB for symmetric & asymmetric errors

OutlookNew paradigm needed in global QCD analysis— simultaneous determination of collinear distributions (also TMDs) using Monte Carlo sampling of parameter space

Near-term future: “universal” QCD analysis of all observables sensitive to collinear (unpolarized & polarized) PDFs and FFs

Longer-term: apply MC technology to global QCD analysisof transverse momentum dependent (TMD) PDFs and FFs

54

Treatment of discrepant data sets needs serious attention— Bayesian perspective has clear merits

Necessary to benchmark MC extractions (not just NNPDF)

the flavor structure of the nucleon sea institute for nuclear … · 2017. 10. 3. · the flavor...

Documents