the flavor structure of the nucleon sea institute for nuclear … · 2017. 10. 3. · the flavor...
TRANSCRIPT
New approaches to global PDF analysis
Wally Melnitchouk
with Nobuo Sato
Connecticut / JLab
The Flavor Structure of the Nucleon SeaInstitute for Nuclear Theory
October 2, 2017
Outline
Bayesian approach to fitting
Motivation — why the need for a new paradigm?
single-fit (Hessian) vs. Monte Carlo approaches
Monte Carlo methods
2
shortcomings of Hessian (Gaussian) approach
Generalization to non-Gaussian likelihoods
“tolerance” factors (uncertainties should not depend on # of parameters!)
Incompatible data sets
iterative MC, nested sampling, …
disjoint probabilities, empirical Bayes, …
Outlook
Motivation
3
With limited number of observables and finite statistics,need a robust analysis framework to extract meaningfulparton information from experiment
Over the first ~ 2-3 decades of global PDF analysis efforts, minimization (single-fit) analysis (with Hessian error propagation) has generally been sufficient to map out global characteristicsof partonic structure
�2
e.g. shapes of quark PDFs from DIS, where data are plentiful
A major challenge has been to characterize PDF uncertainties— in a statistically meaningful way — in the presence oftensions among data sets
Motivation
4
utilize modern techniques based on Bayesian statistics!
Previous attempts sought to address tensions in data setsby introducing
“neural net” parametrization (instead of polynomial parametrization), together with MC techniques
“tolerance” factors (artificially inflating PDF errors)
However, to address the problem in a more statistically rigorous way, one requires going beyond the standard minimization paradigm�2
MotivationIn the near future, standard minimization techniqueswill be unsuitable — even in the absence of tensions —e.g. for
simultaneous analysis of collinear distributions(unpolarized & polarized PDFs, fragmentation functions)
�2
“JAM17”: Jake Ethier (Tuesday)
new types of observables — TMDs or GPDs —that will involve data points, with parameters
> O(105) O(103)
Motivation
Need more reliable algorithms — “PDFs beyond the LHC”!
6
Typically PDF parametrizations are nonlinear functions ofthe PDF parameters, e.g.
Robust parameter estimation that thoroughly scans over arealistic parameter space, including multiple local minima,is only possible using MC methods!
have multiple local minima present in the function�2
xf(x, µ) = Nx
↵(1� x)� P (x)
where P is a polynomial e.g. ,or Chebyshev, neural net, …
P (x) = 1 + �px+ ⇥ x
Bayesian approachto fitting
7
Analysis of data requires estimating expectation values Eand variances V of “observables” (= PDFs, FFs) which arefunctions of parameters
O
E[O] =
ZdnaP(~a|data)O(~a)
V [O] =
ZdnaP(~a|data) ⇥O(~a)� E[O]
⇤2
8
Bayesian approach to fitting
“Bayesian master formulas"
Using Bayes’ theorem, probability distribution given byP
P(~a|data) = 1
ZL(data|~a)⇡(~a)
in terms of the likelihood function L
~a
�2(~a) =
X
i
✓data i � theoryi(~a)
�(data)
◆2
9
Bayesian approach to fitting
Likelihood function
L(data|~a) = exp
✓�1
2
�2(~a)
◆
is a Gaussian form in the data, with function�2
with priors and “evidence”⇡(~a) Z
Z =
ZdnaL(data|~a)⇡(~a)
Z tests if e.g. an n-parameter fit is statistically differentfrom (n+1)-parameter fit
10
Bayesian approach to fitting
Two methods generally used for computing Bayesianmaster formulas:
Maximum Likelihood( minimization)
Monte Carlo�2
11
Bayesian approach to fitting
Two methods generally used for computing Bayesianmaster formulas:
Maximum Likelihood( minimization)�2
maximize probability distribution by minimizingfor a set of best-fit parameters
P �2
~a0
E [~a ] = ~a0
12
Bayesian approach to fitting
Two methods generally used for computing Bayesianmaster formulas:
Maximum Likelihood( minimization)�2
maximize probability distribution by minimizingfor a set of best-fit parameters
P �2
~a0
if is linear in the parameters, and if probability issymmetric in all parametersO ⇡
E [~a ] = ~a0
E [O(~a) ] ⇡ O(~a0)
13
Bayesian approach to fitting
Two methods generally used for computing Bayesianmaster formulas:
Maximum Likelihood( minimization)�2
variance computed by expanding about e.g. in 1 dimension have “master formula”
~a0O(~a)
V [O] ⇡ 1
4
hO(a+ �a)�O(a� �a)
i2
where
�a2 = V [a]
14
Bayesian approach to fitting
Two methods generally used for computing Bayesianmaster formulas:
Maximum Likelihood( minimization)�2
Hij =1
2
@2�2(~a)
@ai@aj
����~a=~a0
find set of (orthogonal) contours in parameter space around such that along each contour is parametrized by statistically independent parameters — directions of contours given by eigenvectors of Hessian matrix H, with elements
~a0L
ek
generalization to multiple dimensions via Hessian approach:
and contours parametrized as , with eigenvectors of H
�a(k) = a(k) � a0 = tkekpvkvk
15
Bayesian approach to fitting
Two methods generally used for computing Bayesianmaster formulas:
Maximum Likelihood( minimization)�2
basic assumption: factorizes along each eigendirection
note: in quadratic approximation for , this becomes a normal distribution
P
P(�a) ⇡Y
k
Pk(tk)
where
Pk(tk) = Nk exp
h� 1
2
�2⇣a0 + tk
ekpvk
⌘i
�2
16
Bayesian approach to fitting
Two methods generally used for computing Bayesianmaster formulas:
Maximum Likelihood( minimization)�2
uncertainties on along each eigendirection(assuming linear approximation)
O
(�Ok)2 ⇡ 1
4
O⇣a0 + Tk
ekpvk
⌘�O
⇣a0 � Tk
ekpvk
⌘�2
V [O] =X
k
(�Ok)2
where is finite step size in , with total varianceTk tk
17
Bayesian approach to fitting
Two methods generally used for computing Bayesianmaster formulas:
Monte Carlo
in practice, generally one hasso the maximal likelihood method will sometimes fail
E[O(~a)] 6= O(E[~a])
Monte Carlo approach samples parameter space andassigns weights to each set of parameterswk ak
expectation value and variance are then weighted averages
,E[O(~a)] =X
k
wk O(~ak) V [O(~a)] =X
k
wk
�O(~ak)� E[O]�2
18
Bayesian approach to fitting
Two methods generally used for computing Bayesianmaster formulas:
Maximum Likelihood( minimization)
Monte Carlo�2
fast
accurate
does not rely onGaussian assumptions
includes all possible solutions
assumes Gaussianity
no guarantee that globalminimum has been found
errors only characterizelocal geometry of function�2
slow
Incompatibledata sets
19
20
Incompatible data sets
Incompatible data sets can arise because of errorsin determining central values, or underestimationof systematic experimental uncertainties
requires some sort of modification to standard statistics
Often one modifies the master formula by introducinga “tolerance” factor T
V [O] =T 2
4
hO(a+ �a)�O(a� �a)
i2e.g. for one dimension
effectively modifies the likelihood function
V [O] ! T 2 V [O]
21
Incompatible data sets
Simple example: consider observable , and two measurements(m1, �m1), (m2, �m2)
m
�2 =
✓m�m1
�m1
◆2
+
✓m�m2
�m2
◆2
E[m] =m1�m2
2 +m2�m21
�m21 + �m2
2
V [m] = H�1 =�m2
1 �m22
�m21 + �m2
2
compute exactly the function�2
and, from Bayesian master formula, the mean value
and variance does notdepend onm1�m2 !
22
�4 �2 0 2 4m
0.0
0.5
1.0
1.5
L(m
|m1,
m2)
�4 �2 0 2 4m
0.0
0.5
1.0
1.5
total uncertainty remains independent of degree of (in)compatibility of data
Incompatible data sets
Simple example: consider observable , and two measurements(m1, �m1), (m2, �m2)
m
Gaussian likelihood gives unrealistic representation of true uncertainty
same different m2
�m2
23
Realistic example: recent CJ (CTEQ-JLab) global PDF analysis
24 parameters,33 data sets
�100 �50 0 50 1000.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
Lik
elih
ood
1
23
4
5
67
8
9
10
11
12
13
141516
1718
1920 212223
2425 26
27
28
29
3031
3233
34
�100 �50 0 50 1000
2000
4000
6000
8000
10000
��
2
1
23
4
5
67
8
91011
12
13141516
1718
1920212223242526
27
28
29
3031323334
�10 �5 0 5 100.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
Lik
elih
ood
1
23
5 812
17 18 27
28
29
3031
34
�10 �5 0 5 100
20
40
60
80
100
��
2
(0) TOTAL
(1) HerF2pCut
(2) slac p
(3) d0Lasy13
(4) e866pd06xf
(5) BNS F2nd
(6) NmcRatCor
(7) slac d
(8) D0 Z
(9) H2 NC ep 3
(10) H2 NC ep 2
(11) H2 NC ep 1
(12) H2 NC ep 4
(13) CDF Wasy
(14) H2 CC ep
(15) cdfLasy05
(16) NmcF2pCor
(17) e866pp06xf
(18) H2 CC em
(19) d0run2cone
(20) d0 gamjet1
(21) CDFrun2jet
(22) d0 gamjet3
(23) d0 gamjet2
(24) d0 gamjet4
(25) jl00106F2d
(26) HerF2dCut
(27) BcdF2dCor
(28) CDF Z
(29) D0 Wasy
(30) H2 NC em
(31) jl00106F2p
(32) d0Lasy e15
(33) BcdF2pCor�1.0
�0.8
�0.6
�0.4
�0.2
0.0
0.2
0.4
0.6
Pro
ject
ion
0
1
23
4
5 6
7
8
9
10 11 12 13 14
15
16 17
18
19
20
21 22 23
(0) a1uv(1) a2uv(2) a4uv(3) a1dv(4) a2dv(5) a3dv(6) a4dv(7) a0ud(8) a1ud(9) a2ud(10) a4ud(11) a1du
(12) a2du(13) a4du(14) a1g(15) a2g(16) a3g(17) a4g(18) a6dv(19) off1(20) off2(21) ht1(22) ht2(23) ht3
data setscompatible along thise-direction
eigen-direction # 3
Incompatible data sets
24
Realistic example: recent CJ (CTEQ-JLab) global PDF analysis
data sets notcompatible along thise-direction
24 parameters,33 data sets
�100 �50 0 50 1000.0
0.1
0.2
0.3
0.4
0.5
0.6
Lik
elih
ood
1
2
3
4
5
6
7
8
910
11
12
13
14
1516
1718
19
20
21
22
2324
25
26
2728
29
3031
32
33
34
�100 �50 0 50 1000
2000
4000
6000
8000
10000
��
2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
1920
2122
232425
26
27
28
29
30
31
32
33
34
�10 �5 0 5 100.0
0.1
0.2
0.3
0.4
0.5
0.6
Lik
elih
ood
1
2
3
4
5
6
7
8
910
11
12
13
14
16
1718
19
2022
23
26
2728
29
3031
32
33
34
�10 �5 0 5 100
20
40
60
80
100
��
2
(0) TOTAL
(1) HerF2pCut
(2) slac p
(3) d0Lasy13
(4) e866pd06xf
(5) BNS F2nd
(6) NmcRatCor
(7) slac d
(8) D0 Z
(9) H2 NC ep 3
(10) H2 NC ep 2
(11) H2 NC ep 1
(12) H2 NC ep 4
(13) CDF Wasy
(14) H2 CC ep
(15) cdfLasy05
(16) NmcF2pCor
(17) e866pp06xf
(18) H2 CC em
(19) d0run2cone
(20) d0 gamjet1
(21) CDFrun2jet
(22) d0 gamjet3
(23) d0 gamjet2
(24) d0 gamjet4
(25) jl00106F2d
(26) HerF2dCut
(27) BcdF2dCor
(28) CDF Z
(29) D0 Wasy
(30) H2 NC em
(31) jl00106F2p
(32) d0Lasy e15
(33) BcdF2pCor�0.8
�0.6
�0.4
�0.2
0.0
0.2
0.4
0.6
0.8
Pro
ject
ion
0
1
2
3
4
5
6
7 89
10
1112
13 14
15
16
17
18
19
20
21
22
23
(0) a1uv(1) a2uv(2) a4uv(3) a1dv(4) a2dv(5) a3dv(6) a4dv(7) a0ud(8) a1ud(9) a2ud(10) a4ud(11) a1du
(12) a2du(13) a4du(14) a1g(15) a2g(16) a3g(17) a4g(18) a6dv(19) off1(20) off2(21) ht1(22) ht2(23) ht3
eigen-direction #18
Incompatible data sets
25
Realistic example: recent CJ (CTEQ-JLab) global PDF analysis
24 parameters,33 data sets
�100 �50 0 50 1000.0
0.1
0.2
0.3
0.4
0.5
0.6
Lik
elih
ood
1
2
3
4
5
6
7
8
910
11
12
13
14
1516
1718
19
20
21
22
2324
25
26
2728
29
3031
32
33
34
�100 �50 0 50 1000
2000
4000
6000
8000
10000
��
2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
1920
2122
232425
26
27
28
29
30
31
32
33
34
�10 �5 0 5 100.0
0.1
0.2
0.3
0.4
0.5
0.6
Lik
elih
ood
1
2
3
4
5
6
7
8
910
11
12
13
14
16
1718
19
2022
23
26
2728
29
3031
32
33
34
�10 �5 0 5 100
20
40
60
80
100
��
2
(0) TOTAL
(1) HerF2pCut
(2) slac p
(3) d0Lasy13
(4) e866pd06xf
(5) BNS F2nd
(6) NmcRatCor
(7) slac d
(8) D0 Z
(9) H2 NC ep 3
(10) H2 NC ep 2
(11) H2 NC ep 1
(12) H2 NC ep 4
(13) CDF Wasy
(14) H2 CC ep
(15) cdfLasy05
(16) NmcF2pCor
(17) e866pp06xf
(18) H2 CC em
(19) d0run2cone
(20) d0 gamjet1
(21) CDFrun2jet
(22) d0 gamjet3
(23) d0 gamjet2
(24) d0 gamjet4
(25) jl00106F2d
(26) HerF2dCut
(27) BcdF2dCor
(28) CDF Z
(29) D0 Wasy
(30) H2 NC em
(31) jl00106F2p
(32) d0Lasy e15
(33) BcdF2pCor�0.8
�0.6
�0.4
�0.2
0.0
0.2
0.4
0.6
0.8
Pro
ject
ion
0
1
2
3
4
5
6
7 89
10
1112
13 14
15
16
17
18
19
20
21
22
23
(0) a1uv(1) a2uv(2) a4uv(3) a1dv(4) a2dv(5) a3dv(6) a4dv(7) a0ud(8) a1ud(9) a2ud(10) a4ud(11) a1du
(12) a2du(13) a4du(14) a1g(15) a2g(16) a3g(17) a4g(18) a6dv(19) off1(20) off2(21) ht1(22) ht2(23) ht3
eigen-direction #18
Incompatible data sets
standard Gaussian likelihood incapable of accounting for underestimated individual errors (leading to incompatible data sets)— not designed for such scenarios!
data sets notcompatible along thise-direction
26
SLAC-PUB-16238
Comment on “New Limits on Intrinsic Charm in the Nucleon from Global Analysis ofParton Distributions”
Stanley J. Brodsky1 and Susan Gardner2
1SLAC National Accelerator Laboratory, Stanford University, Stanford, CA 943092Department of Physics and Astronomy, University of Kentucky, Lexington, KY 40506-0055
A Comment on the Letter by P. Jimenez-Delgado, T. J. Hobbs, J. T. Londergan, and W. Mel-nitchouk, Phys. Rev. Lett. 114, 082002 (2015).
Intrinsic heavy quarks in hadrons emerge from the non-perturbative structure of a hadron bound state [1] and are arigorous prediction of QCD [2, 3]. Lattice QCD calculations also indicate a significant intrinsic charm probability [4, 5].Since the light-front momentum distribution of the Fock states is maximal at equal rapidity, intrinsic heavy quarkscarry significant fractions of the momentum. The presence of Fock states with intrinsic strange, charm, or bottomquarks in hadrons lead to an array of novel physics phenomena [6]. Accurate determinations of the heavy-quarkdistribution functions in the proton are needed to interpret LHC measurements as probes of physics beyond theStandard Model [7, 8]. Determinations [7, 9, 10] of the momentum fraction carried by intrinsic charm quarks in theproton typically limit ⟨x⟩IC ∼ O(1%) at 90% CL, consistent with the analysis of the EMC measurements of the charmstructure function [11] and the large rate for high-pT pp → cγX reactions at the Tevatron [12]; however, a precisedetermination of ⟨x⟩IC has proved elusive. The letter by P. Jimenez-Delgado, T. J. Hobbs, J. T. Londergan, andW. Melnitchouk (JDHLM) [13] is the most recent of such analyses, and it finds a much more severe limit on intrinsiccharm ⟨x⟩IC ∼ O(0.1%) than the previous such study [7]. JDHLM input different shapes for the intrinsic charmcontributions but allow the overall normalization to vary. They include low-energy data from the 1991 single-armed (p) → e′X SLAC experiment [14] in their global fit. Ref. [7] did not use the SLAC data and came to much weakerconclusions. Nevertheless, we believe the very stringent conclusions of JDHLM are in error.JDHLM assess their PDF errors using a tolerance criteria of ∆χ2 = 1 at 1σ; however, the actual value of ∆χ2 to
be employed depends on the number of parameters to be simultaneously determined in the fit. This is illustrated inTable 38.2 of Ref. [15] and is used broadly, noting, e.g., Refs. [16–19]. Ref. [7] employs the CT10 PDF analysis [20],so that it contains 25 parameters, plus one for intrinsic charm. Figure 38.2 of Ref. [15] then shows that ∆χ2 ≈ 29 at1σ (68% CL), whereas ∆χ2 ≈ 36 at 90% CL. Ref. [7] uses the criterion ∆χ2 > 100, determined on empirical grounds,to indicate a poor fit. JDHLM employs the framework of Ref. [21] which contains 25 parameters for the PDFs and12 for the higher-twist contributions, so that a much larger tolerance than ∆χ2 = 1 is warranted.JDHLM find that the SLAC data (on d and p targets) give the strongest constraints on intrinsic charm, although,
by their count, only 157 of 1021 data points have W 2 in excess of the charm hadronic threshold: W 2th
≈ 16GeV2.[JDHLM mention the partonic threshold constraint W 2 > 4m2
c, but this is not relevant for the detection of intrinsiccharm — if x < 1, leptons can only scatter off charm quarks when the kinematics permit the formation of charmedhadrons in the final state.] It is possible that JDHLM’s strong rejection of the intrinsic charm hypothesis is drivenby sharpened constraints on the non-charm PDFs. However, for the SLAC data set, the theoretical model which isconstrained is that of the intrinsic charm PDF combined with the treatment of uncertain higher-twist and thresholdcorrections. Thus a global analysis cannot reject intrinsic charm per se, but rather only the particular model in whichit is embedded.We also note that JDHLM exclude the EMC data — which indicate significant intrinsic charm — citing a “goodness
of fit” criterion. Statistical criteria alone cannot allow the exclusion of data sets, as here with the EMC data; additionalcorrections, however, may exist through their use of an iron target [22, 23].Finally, we note that the SLAC measurements of ed (p) → e′X , which only detects the scattered electron, has an
overall normalization (systematic) error of ± 1.7 (2.1)%, and a relative normalization error of typically ±1.1% [14].The SLAC data points in the W 2 > 16 GeV2 and x > 0.1 regime where intrinsic charm could be directly relevanthave even larger statistical uncertainties. Thus it seems implausible that the SLAC data can yield the severe contraintclaimed.JDHLM claim that the momentum fraction carried by intrinsic charm is ⟨x⟩IC < 0.1% at the 5σ level, and they
note in their final summary that ⟨x⟩IC ≤ 0.5% at 4σ. We find neither conclusion is warranted.We thank B. Plaster for a cross-check of Fig. 38.2 in Ref. [15] and B. Plaster, A. Deur, P. Hoyer, C. Lorce,
J. Pumplin, and R. Vogt for helpful remarks. We acknowledge support from the U.S. Department of Energy undercontracts DE–AC02–76SF00515 and DE–FG02–96ER40989.
CTEQ “tolerance criteria” (variations adopted by other groups, e.g., MMHT, CJ)
scaling of with number of parameters (or number of degrees of freedom)
Two ways in which tolerance factors usually implemented
Incompatible data sets
Pumplin, Stump, Huston, Lai, Nadolsky, TungJHEP 07 (2002) 012
��2
e.g. Brodsky, GardnerPRL (Comment) 116, 019101 (2016)
27
Incompatible data sets
CTEQ tolerance criteria
for each experiment, find minimum along given e-direction�2
from distribution determine 90% CL for each experiment �2
along each side of e-direction, determine maximum rangeallowed by the most constraining experiment
d±k
!30
!20
!10
0
10
20
30
40
distance
Eigenvector 4
BCDMSp
BCDMSd
H1a
H1b
ZEUS
NMCp
NMCr
CCFR2
CCFR3
E605
CDFw
E866
D0jet
CDFjet
CTEQ6eigen-direction #4
p��2
computed by averaging over all (typically )d±kT T ⇠ 5� 10
T (⇡ 10)
28
Incompatible data sets
CTEQ tolerance criteria
!30
!20
!10
0
10
20
30
40
distance
Eigenvector 4
BCDMSp
BCDMSd
H1a
H1b
ZEUS
NMCp
NMCr
CCFR2
CCFR3
E605
CDFw
E866
D0jet
CDFjet
CTEQ6eigen-direction #4
p��2
T (⇡ 10)
This approach is not consistent with Gaussian likelihood
no clear Bayesian interpretation of uncertainties(ultimately, a prescription…)
29
Scaling of with # of parameters: “ paradox”��2 ��2
Incompatible data sets
Simple example: two parameters with mean values and standard deviation
✓i (i = 1, 2)µi �i
joint probability distribution
P(✓1, ✓2) =Y
i=1,2
1p2⇡�2
i
exp
"�1
2
✓✓i � µi
�i
◆2#
change variables and usepolar coordinates r2 = t21 + t22, � = tan�1(t2/t1)
✓i ! ti = (✓i � µi)/�i
d✓1d✓2 P(✓1, ✓2) =d�
2⇡rdr exp
�1
2
r2�
30
Scaling of with # of parameters: “ paradox”��2 ��2
Incompatible data sets
confidence volume
= 68% for R = 2.279
CV ⌘Z
d✓1d✓2 P(✓1, ✓2) =
Z R
0dr r exp
�1
2
r2�
note that , so that confidenceregion for parameters
R2 = t21 + t22 ⌘ �2
max [ti] = R
implies that , which contradicts✓i = µi ± �i R
original premise that !✓i = µi ± �i
t2
t1
R
31
Scaling of with # of parameters: “ paradox”��2 ��2
Incompatible data sets
to resolve paradox, use Bayesian master formulas
E[✓i] =
Z 2⇡
0
d�
2⇡
Z 1
0drP(r,�) ✓i
=
Z 2⇡
0
d�
2⇡
Z 1
0dr r e�r2/2 (µi + ti �i) = µi X
32
Scaling of with # of parameters: “ paradox”��2 ��2
Incompatible data sets
to resolve paradox, use Bayesian master formulas
E[✓i] =
Z 2⇡
0
d�
2⇡
Z 1
0drP(r,�) ✓i
=
Z 2⇡
0
d�
2⇡
Z 1
0dr r e�r2/2 (µi + ti �i) = µi
V [✓i] =
Z 2⇡
0
d�
2⇡
Z 1
0drP(r,�) (✓i � µi)
2
=
Z 2⇡
0
d�
2⇡
Z 1
0dr r e�r2/2 (ti �i)
2 = �2i
X
X
33
Scaling of with # of parameters: “ paradox”��2 ��2
Incompatible data sets
no paradox if use for any number of parameters to characterize the CL
��2 = 1
1�
only consistent tolerance for Gaussian likelihood is T = 1
34
To summarize standard maximum likelihood method…
for ~30 parameters trying different starting points isimpractical, if do not have some information about shape
Gradient search (in parameter space) depends how “good” thestarting point is
Cannot guarantee solution is unique
Introduction of tolerance modifies Gaussian statistics
Common to free parameters initially, then freeze thosenot sensitive to data ( flat locally)�2
introduces bias, does not guarantee that flat globally�2
Error propagation characterized by quadratic near minimum�2
no guarantee this is quadratic globally (e.g. Student t-distribution?)
Monte Carlo methods
35
36
Monte Carlo
Designed to faithfully compute Bayesian master formulas
Do not assume a single minimum, include all possible solutions(with appropriate weightings)
Do not assume likelihood is Gaussian in parameters
Allows likelihood analysis to be extended to address tensionsamong data sets via Bayesian inference
More computationally demanding compared with Hessian method
37
Monte Carlo
First group to use MC for global PDF analysis was NNPDF,using neural network to parametrize in P (x)
f(x) = N x
↵(1� x)� P (x)
Iterative Monte Carlo (IMC), developed by JAM Collaboration,variant of NNPDF, tailored to non-neutral net parametrizations
J. Ethier
Markov Chain MC (MCMC) / Hybid MC (HMC)— recent “proof of principle” analysis, ideas from lattice QCD
Gbedo, Mangin-Brinet,PRD 96, 014015 (2017)
Nested sampling (NS) — computes integrals in Bayesian masterformulas (for E, V, Z) explicitly Skilling (2004)
— are fitted “preprocessing coefficients”↵,�
no assumptions for exponents
Use traditional functional form for input distribution shape,but sample significantly larger parameter space than possiblein single-fit analyses
cross-validation to avoidoverfitting
iterate until convergencecriteria satisfied
38
sampler
priors
fit
fit
fit
posteriors
original data
pseudo
data
training
data
fit
parameters from
minimization steps
validation
data
validation
posterior
as initial
guess
prior
Iterative Monte Carlo (IMC)
Iterative Monte Carlo (IMC)
0.2 0.4 0.6 0.8 z
0.2
0.6
1.0
1.4
init
ial
u+
0.2 0.4 0.6 0.8 z
0.2
0.6
1.0
1.4
d+
0.2 0.4 0.6 0.8 z
0.2
0.6
1.0
1.4
s+
0.2 0.4 0.6 0.8 z
0.2
0.6
1.0
1.4
c+
0.2 0.4 0.6 0.8 z
0.2
0.6
1.0
1.4
b+
0.2 0.4 0.6 0.8 z
0.2
0.6
1.0
1.4
g
0.2 0.4 0.6 0.8 z
0.2
0.6
1.0
1.4
inte
rmed
iate ⇡+
K+
0.2 0.4 0.6 0.8 z
0.2
0.6
1.0
1.4
0.2 0.4 0.6 0.8 z
0.2
0.6
1.0
1.4
0.2 0.4 0.6 0.8 z
0.2
0.6
1.0
1.4
0.2 0.4 0.6 0.8 z
0.2
0.6
1.0
1.4
0.2 0.4 0.6 0.8 z
0.2
0.6
1.0
1.4
⇡+
K+
0.2 0.4 0.6 0.8 z
0.2
0.6
1.0
1.4
inte
rmed
iate
0.2 0.4 0.6 0.8 z
0.2
0.6
1.0
1.4
0.2 0.4 0.6 0.8 z
0.2
0.6
1.0
1.4
0.2 0.4 0.6 0.8 z
0.2
0.6
1.0
1.4
0.2 0.4 0.6 0.8 z
0.2
0.6
1.0
1.4
0.2 0.4 0.6 0.8 z
0.2
0.6
1.0
1.4
0.2 0.4 0.6 0.8 z
0.2
0.6
1.0
1.4
final
0.2 0.4 0.6 0.8 z
0.2
0.6
1.0
1.4
0.2 0.4 0.6 0.8 z
0.2
0.6
1.0
1.4
0.2 0.4 0.6 0.8 z
0.2
0.6
1.0
1.4
0.2 0.4 0.6 0.8 z
0.2
0.6
1.0
1.4
0.2 0.4 0.6 0.8 z
0.2
0.6
1.0
1.4
e.g. of convergence (for fragmentation functions) in IMC
39
Iterative Monte Carlo (IMC)
~ 20iterations
Sato et al.PRD 94, 114004(2016)
40
Nested Sampling
Basic idea: transform n-dimensional integral to 1-D integral
Z =
ZdnaL(data|~a)⇡(~a) =
Z 1
0dX L(X)
where prior volume dX = ⇡(~a) dna
such that 0 < · · · < X2 < X1 < X0 = 1Feroz et al.arXiv:1306.2144 [astro-ph]
41
Nested Sampling
Approximate evidence by a weighted sum
Z ⇡X
i
Li wi wi =1
2(Xi�1 �Xi+1)with weights
Algorithm:
repeat until entire prior volume has been traversed
can be parallelized
randomly select samples from full prior s.t. initial volume X0 = 1
for each iteration, remove point with lowest , replacing itwith point from prior with constraint that its
Li
L > Li
increasingly used in fields outside of (nuclear) analysis
performs better than VEGAS for large dimensions
42
Nested Sampling
0.0 0.5 1.0 1.5 2.0 2.5
hu(1)1
0
2
4
6
norm
alize
dyie
ld
SIDIS
SIDIS + Lattice
�4 �3 �2 �1 0
hd(1)1
0
2
4
norm
alize
dyie
ld
SIDIS
SIDIS + Lattice
0 1 2 3 4gT
0.0
2.5
5.0
7.5
norm
alize
dyie
ld
SIDIS
SIDIS + Lattice
Recent applicationin global analysis of transversity TMD PDF
H.-W. Lin�u
�d
gT = �u� �dLin, WM, Prokudin,Sato, Shows (2017)
43
Assuming a single minimum, a Hessian or MC analysis must givesame results, if using same likelihood function
MC Error Analysis
analysis of pseudodata, generated using Gaussian distribution
0.0 0.2 0.4 0.6 0.8 1.0
x
0.0
0.5
1.0
1.5
2.0
2.5
Ng(
x;b
)
HESS
IMC
NS
HMC
0.0 0.2 0.4 0.6 0.8 1.0
x
0.96
0.98
1.00
1.02
1.04
Rat
ioto
hess
0.0 0.2 0.4 0.6 0.8 1.0
x
0.96
0.98
1.00
1.02
1.04
Rat
ioto
hess
0.0 0.2 0.4 0.6 0.8 1.0
x
0.96
0.98
1.00
1.02
1.04R
atio
tohe
ss
IMC
Nested HMC
44
Assuming a single minimum, a Hessian or MC analysis must givesame results, if using same likelihood function
MC Error Analysis
also for discrepant data
almost identical uncertainty bands for Hessian and for MC!
0.0 0.2 0.4 0.6 0.8 1.0
x
0.0
0.5
1.0
1.5
2.0
2.5
Ng(
x;b
)
HESS
IMC
HMC
NS
0.0 0.2 0.4 0.6 0.8 1.0
x
0.96
0.98
1.00
1.02
1.04
Rat
ioto
hess
0.0 0.2 0.4 0.6 0.8 1.0
x
0.96
0.98
1.00
1.02
1.04
Rat
ioto
hess
0.0 0.2 0.4 0.6 0.8 1.0
x
0.96
0.98
1.00
1.02
1.04
Rat
ioto
hess
0.0 0.2 0.4 0.6 0.8 1.0
x
0.0
0.5
1.0
1.5
2.0
2.5
Ng(
x;b
)
HESS
IMC
HMC
NS
0.0 0.2 0.4 0.6 0.8 1.0
x
0.96
0.98
1.00
1.02
1.04
Rat
ioto
hess
0.0 0.2 0.4 0.6 0.8 1.0
x
0.96
0.98
1.00
1.02
1.04
Rat
ioto
hess
0.0 0.2 0.4 0.6 0.8 1.0
x
0.96
0.98
1.00
1.02
1.04R
atio
tohe
ss
Nested
45
Assuming a single minimum, a Hessian or MC analysis must givesame results, if using same likelihood function
MC Error Analysis
Approaches that use Hessian + tolerance factor not consistentwith Gaussian likelihood function
Assuming sufficient observables to determine PDFs, thenPDF uncertainties cannot depend on parametrization!
NNPDF group claim that within their neural net MC methodology, no need for a tolerance factor, since uncertainties similar toother groups who use Hessian + tolerance
how can this be?
Non-Gaussian likelihood
46
Rigorous (Bayesian) way to address incompatible data setsis to use generalization of Gaussian likelihood
47
Incompatible data sets
joint vs. disjoint distributions
empirical Bayes
hierarchical Bayes
others, used in different fields
48
Disjoint distributions
Instead of using total likelihood that is a product (“and”)of individual likelihoods, e.g. for simple example of two measurements
L(m1m2|m; �m1�m2) = L(m1|m; �m1)⇥ L(m2|m; �m2)
use instead sum (“or”) of individual likelihoods
L(m1m2|m; �m1�m2) =1
2
hL(m1|m; �m1) + L(m2|m; �m2)
i
gives rather different expectation value and variance
E[m] =1
2(m1 +m2)
V [m] =1
2(�m2
1 + �m22) +
✓m1 �m2
2
◆2
depends onseparation!
49
Disjoint distributions
�4 �2 0 2 4m
0.0
0.5
1.0
1.5
L(m
|m1,
m2)
�4 �2 0 2 4m
0.0
0.5
1.0
1.5
L(m
1m
2|m
) joint
disjoint
joint:
disjoint: V [m] =1
2(�m2
1 + �m22) +
✓m1 �m2
2
◆2
V [m] =�m2
1 �m22
�m21 + �m2
2
Symmetric uncertainties �m1 = �m2
50
Disjoint distributions
�4 �2 0 2 4m
0
1
2
3
L(m
|m1,
m2)
�4 �2 0 2 4m
0
1
2
3
L(m
1m
2|m
)Asymmetric uncertainties �m1 6= �m2
disjoint likelihood gives broader overall uncertainty,overlapping individual (discrepant) data
51
Shortcoming of conventional Bayesian — still assumeprior distribution follows specific form (e.g. Gaussian)
Empirical Bayes
Extend approach to more fully represent prior uncertainties,with final uncertainties that do not depend on initial choices
In generalized approach, data uncertainties modified bydistortion parameters, whose probability distributions givenin terms of “hyperparameters” (or “nuisance parameters”)
Hyperparameters determined from datagive posteriors for both PDF and hyperparameters
52
Empirical Bayes
Standard mean and variance that characterize data✓ = µ+ � f(µ) + g(�)
where are unknown functions that account forfaulty measurements
f(µ), g(�)
Simple choice is
(µ,�) ! (⇣1 µ+ ⇣2, ⇣3 �)
where are distortion parameters, with prob. dists.described by hyperparameters
⇣1,2,3�1,2,3
Likelihood function is then
L(data|~a, ⇣1,2,3) ⇠ exp
"�1
2
X
i
✓d1 � f(µi(~a, ⇣1,2))
g(�, ⇣3)
◆2#⇡1(⇣1|�1)⇡2(⇣1|�2)⇡3(⇣1|�3)
53
Empirical Bayes
�1.0 �0.5 0.0 0.5 1.00
2
4
6
8
10
P(t
|D) 1–�
�0.5 �0.4 �0.3 �0.2 �0.1 0.0 0.10
5
10
15
20
25
experiment 1
experiment 2
fit
EB fit
�1.0 �0.5 0.0 0.5 1.00
2
4
6
8
10
P(t
|D) 6–�
�0.5 �0.4 �0.3 �0.2 �0.1 0.0 0.10
5
10
15
20
25
�1.0 �0.5 0.0 0.5 1.0
t
0
2
4
6
8
10
P(t
|D) 12–�
�0.5 �0.4 �0.3 �0.2 �0.1 0.0 0.1
t
0
5
10
15
20
25
HessianEmpirical Bayes
Simple example of EB for symmetric & asymmetric errors
OutlookNew paradigm needed in global QCD analysis— simultaneous determination of collinear distributions (also TMDs) using Monte Carlo sampling of parameter space
Near-term future: “universal” QCD analysis of all observables sensitive to collinear (unpolarized & polarized) PDFs and FFs
Longer-term: apply MC technology to global QCD analysisof transverse momentum dependent (TMD) PDFs and FFs
54
Treatment of discrepant data sets needs serious attention— Bayesian perspective has clear merits
Necessary to benchmark MC extractions (not just NNPDF)