diagnostic accuracy of neonatal assessment for gestational age ... · with ultrasound, the dubowitz...

26
a Department of Pediatric Newborn Medicine, and Departments of g Channing Division of Network Medicine, Medicine, Brigham and Womens Hospital, Boston, Massachusetts; b Harvard Medical School, Harvard University, Boston, Massachusetts; c Department of Global Health and Population, Harvard T.H. Chan School of Public Health, Boston, Massachusetts; d Department of Clinical Research, OpenBiome, Somerville, Massachusetts; e Department of Pediatrics, University of Rochester Medical Center, Rochester, New York; f Department of Research, Community To cite: Lee AC, Panchal P, Folger L, et al. Diagnostic Accuracy of Neonatal Assessment for Gestational Age Determination: A Systematic Review. Pediatrics. 2017;140(4):e20171423 CONTEXT: An estimated 15 million neonates are born preterm annually. However, in low- and middle-income countries, the dating of pregnancy is frequently unreliable or unknown. OBJECTIVE: To conduct a systematic literature review and meta-analysis to determine the diagnostic accuracy of neonatal assessments to estimate gestational age (GA). DATA SOURCES: PubMed, Embase, Cochrane, Web of Science, POPLINE, and World Health Organization library databases. STUDY SELECTION: Studies of live-born infants in which researchers compared neonatal signs or assessments for GA estimation with a reference standard. DATA EXTRACTION: Two independent reviewers extracted data on study population, design, bias, reference standard, test methods, accuracy, agreement, validity, correlation, and interrater reliability. RESULTS: Four thousand nine hundred and fifty-six studies were screened and 78 included. We identified 18 newborn assessments for GA estimation (ranging 4 to 23 signs). Compared with ultrasound, the Dubowitz score dated 95% of pregnancies within ±2.6 weeks ( n = 7 studies), while the Ballard score overestimated GA (0.4 weeks) and dated pregnancies within ±3.8 weeks ( n = 9). Compared with last menstrual period, the Dubowitz score dated 95% of pregnancies within ± 2.9 weeks ( n = 6 studies) and the Ballard score, ±4.2 weeks ( n = 5). Assessments with fewer signs tended to be less accurate. A few studies showed a tendency for newborn assessments to overestimate GA in preterm infants and underestimate GA in growth-restricted infants. LIMITATIONS: Poor study quality and few studies with early ultrasound-based reference. CONCLUSIONS: Efforts in low- and middle-income countries should focus on improving dating in pregnancy through ultrasound and improving validity in growth-restricted populations. Where ultrasound is not possible, increased efforts are needed to develop simpler yet specific approaches for newborn assessment through new combinations of existing parameters, new signs, or technology. Diagnostic Accuracy of Neonatal Assessment for Gestational Age Determination: A Systematic Review Anne CC Lee, MD, MPH, a,b Pratik Panchal, MD, MPH, c,d Lian Folger, BA, a Hilary Whelan, MD, e Rachel Whelan, MPH, BA, f Bernard Rosner, PhD, b,g Hannah Blencowe, MRCPCH, MBChB, Msc, h,i Joy E. Lawn, MBBS, PhD h,i abstract PEDIATRICS Volume 140, number 6, December 2017:e20171423 REVIEW ARTICLE by guest on May 26, 2019 www.aappublications.org/news Downloaded from

Upload: truongdang

Post on 26-May-2019

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Diagnostic Accuracy of Neonatal Assessment for Gestational Age ... · with ultrasound, the Dubowitz score dated 95% of pregnancies within ±2.6 weeks (n = 7 studies), while the Ballard

aDepartment of Pediatric Newborn Medicine, and Departments of gChanning Division of Network Medicine, Medicine, Brigham and Women’s Hospital, Boston, Massachusetts; bHarvard Medical School, Harvard University, Boston, Massachusetts; cDepartment of Global Health and Population, Harvard T.H. Chan School of Public Health, Boston, Massachusetts; dDepartment of Clinical Research, OpenBiome, Somerville, Massachusetts; eDepartment of Pediatrics, University of Rochester Medical Center, Rochester, New York; fDepartment of Research, Community

To cite: Lee AC, Panchal P, Folger L, et al. Diagnostic Accuracy of Neonatal Assessment for Gestational Age Determination: A Systematic Review. Pediatrics. 2017;140(4):e20171423

CONTEXT: An estimated 15 million neonates are born preterm annually. However, in low- and middle-income countries, the dating of pregnancy is frequently unreliable or unknown.OBJECTIVE: To conduct a systematic literature review and meta-analysis to determine the diagnostic accuracy of neonatal assessments to estimate gestational age (GA).DATA SOURCES: PubMed, Embase, Cochrane, Web of Science, POPLINE, and World Health Organization library databases.STUDY SELECTION: Studies of live-born infants in which researchers compared neonatal signs or assessments for GA estimation with a reference standard.DATA EXTRACTION: Two independent reviewers extracted data on study population, design, bias, reference standard, test methods, accuracy, agreement, validity, correlation, and interrater reliability.RESULTS: Four thousand nine hundred and fifty-six studies were screened and 78 included. We identified 18 newborn assessments for GA estimation (ranging 4 to 23 signs). Compared with ultrasound, the Dubowitz score dated 95% of pregnancies within ±2.6 weeks (n = 7 studies), while the Ballard score overestimated GA (0.4 weeks) and dated pregnancies within ±3.8 weeks (n = 9). Compared with last menstrual period, the Dubowitz score dated 95% of pregnancies within ± 2.9 weeks (n = 6 studies) and the Ballard score, ±4.2 weeks (n = 5). Assessments with fewer signs tended to be less accurate. A few studies showed a tendency for newborn assessments to overestimate GA in preterm infants and underestimate GA in growth-restricted infants.LIMITATIONS: Poor study quality and few studies with early ultrasound-based reference.CONCLUSIONS: Efforts in low- and middle-income countries should focus on improving dating in pregnancy through ultrasound and improving validity in growth-restricted populations. Where ultrasound is not possible, increased efforts are needed to develop simpler yet specific approaches for newborn assessment through new combinations of existing parameters, new signs, or technology.

Diagnostic Accuracy of Neonatal Assessment for Gestational Age Determination: A Systematic ReviewAnne CC Lee, MD, MPH, a, b Pratik Panchal, MD, MPH, c, d Lian Folger, BA, a Hilary Whelan, MD, e Rachel Whelan, MPH, BA, f Bernard Rosner, PhD, b, g Hannah Blencowe, MRCPCH, MBChB, Msc, h, i Joy E. Lawn, MBBS, PhDh, i

abstract

Lee et alDiagnostic Accuracy of Neonatal Assessment for Gestational Age Determination: A Systematic Review

2017https://doi.org/10.1542/peds.2017-1423

4Pediatrics

ROUGH GALLEY PROOFOctober 2017

140

PEDIATRICS Volume 140, number 6, December 2017:e20171423 Review ARticle by guest on May 26, 2019www.aappublications.org/newsDownloaded from

Page 2: Diagnostic Accuracy of Neonatal Assessment for Gestational Age ... · with ultrasound, the Dubowitz score dated 95% of pregnancies within ±2.6 weeks (n = 7 studies), while the Ballard

Of the estimated 14.9 million annual preterm births, 13.6 million (91%) occur in low- and middle-income countries (LMIC) .1, 2 Preterm birth is the leading cause of mortality in children less than 5 years of age globally, accounting for 1 million neonatal deaths annually, almost all of which are in LMIC.3 In these settings, early recognition of the preterm infant may facilitate the timely delivery of life-saving interventions, such as continuous positive airway pressure or kangaroo mother care.

Ultrasound dating in early pregnancy is the most accurate method currently available to assess gestational age (GA) and is a standard of care in high-income countries. In LMIC, pregnancy dating is challenging, and GA of the infant is frequently unknown or inaccurate. Maternal recall of last menstrual period (LMP) is often unavailable or unreliable, particularly in populations with high rates of maternal illiteracy.4, 5 The shortage of health care providers in LMIC, currently estimated at 7.9 million, 6 contributes to poor coverage of antenatal care. In sub-Saharan Africa and Southeast Asia, fewer than one-third of mothers in households in the poorest quintile receive at least 1 antenatal care visit.7 Furthermore, the timing of the first visit for antenatal care is late, occurring typically late in the second trimester.8, 9 Moreover, access to ultrasonography is low, with <7% of pregnant women having access to ultrasound in rural sub-Saharan Africa.4 Traditional sonography in late pregnancy is notably inaccurate for determining GA (±4 weeks).10, 11

Clinical assessment of newborn maturity has long been used as a proxy to estimate GA after birth (Table 1). In 1966, Farr et al12 defined a classification for the development of external physical characteristics in the newborn. In 1968, Amiel-Tison13 described the assessment of neonatal neurologic maturation. Dubowitz et al14 developed a score for GA based

on a combination of neurologic and physical signs, which dated pregnancies within 5 days of LMP in their original study. Since then, several simplified clinical assessments have been described in the literature.15 – 18 The Ballard score19 is one of the most commonly usedand was revised to the New Ballard score in 1991 to improve accuracy for early preterm infants.20

Newborn assessment for GA dating has become less relevant in high-income settings, where ultrasound coverage is high and uncertainty of antenatal pregnancy dating is less common than in LMIC. In LMIC settings without widespread access to early ultrasound dating and where accuracy of LMP recall is highly variable, clinical assessment of the newborn remains the commonest available tool to evaluate GA. Accurate GA is necessary to identify preterm and small-for-gestational-age (SGA) babies and provide them with effective interventions.

The Every Newborn Action Plan was launched in 2014 with the aim to end preventable neonatal deaths and stillbirths by 2030.34 GA measurement was identified as a priority area35 for improving (1) the epidemiology of preterm birth and SGA and (2) the comparability of neonatal mortality estimates through stratification by GA and birth weight.

In this systematic review, we aim to (1) identify individual neonatal signs and combined clinical scores or assessments that have been used to ascertain GA of newborns; and (2) assess the diagnostic accuracy and reliability of these methods for estimating GA, compared with dating by a reference standard (ie, ultrasound or LMP).

MeThods

search strategy

We conducted a systematic review of the published and gray literature,

initially done in March 2015 and updated in June 2016 (Fig 1). Databases we searched included PubMed, Embase, Cochrane, Web of Science, POPLINE, and the World Health Organization Global Health Libraries and regional databases (Latin American and Carribbean Health Sciences, Index Medicus for the Eastern Mediterranean Region, African Index Medicus). The review was registered with the International Prospective Register of Systematic Reviews (PROSPERO registration number: CRD42015020499). The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement, review protocol, and detailed search terms are available in the Supplemental Information.

Inclusion Criteria

There were no language restrictions. Abstracts of non-English articles were translated via Google Translate, and if eligible, the full text was translated to English by fluent speakers. Articles were considered for inclusion if the study met the following criteria: (1) included live-born neonates; (2) compared at least 2 methods of GA estimation, 1 of which was a neonatal clinical assessment, score or individual clinical sign(s); and (3) reported at least 1 statistic assessing correlation, agreement, or validity of GA estimation. Prenatal assessments (eg, symphysis fundal height, ultrasound) and neonatal anthropometrics (eg, foot length) were reviewed separately and will be reported elsewhere.

exclusion Criteria

We excluded studies in which researchers did not provide data describing the correlation, agreement, or validity of neonatal clinical assessment compared with a reference method of pregnancy dating (ie, ultrasound or LMP). We excluded studies from specialized subpopulations (eg, infants of

LEE et al2

Lee et alDiagnostic Accuracy of Neonatal Assessment for Gestational Age Determination: A Systematic Review

2017https://doi.org/10.1542/peds.2017-1423

4Pediatrics

ROUGH GALLEY PROOFOctober 2017

140

by guest on May 26, 2019www.aappublications.org/newsDownloaded from

Page 3: Diagnostic Accuracy of Neonatal Assessment for Gestational Age ... · with ultrasound, the Dubowitz score dated 95% of pregnancies within ±2.6 weeks (n = 7 studies), while the Ballard

PEDIATRICS Volume 140, number 6, December 2017 3

Lee et alDiagnostic Accuracy of Neonatal Assessment for Gestational Age Determination: A Systematic Review

2017https://doi.org/10.1542/peds.2017-1423

4Pediatrics

ROUGH GALLEY PROOFOctober 2017

140

TABL

e 1

Neon

atal

Ass

essm

ents

/ Sc

orin

g Sy

stem

s by

Lev

el o

f Com

plex

ity

Clin

ical

Sc

orin

g Sy

stem

or

Nam

e

No. o

f cr

iteri

aPh

ysic

al C

rite

ria

Neur

omus

cula

r Cr

iteri

aOt

her

Crite

ria

Refe

renc

e St

anda

rdOr

igin

al R

epor

ted

Accu

racy

or

Corr

elat

ion

with

GA

Stud

y Se

ttin

g an

d Lo

catio

nSa

mpl

e Si

zeYe

ar

Amie

l-Tis

on

et a

l2123

Skin

col

or, s

kin

opac

ity, s

kin

text

ure,

ede

ma,

lanu

go,

skul

l har

dnes

s, e

ar fo

rm,

ear

firm

ness

, gen

itals

, br

east

siz

e, n

ippl

e fo

rmat

ion,

pla

ntar

cre

ases

Retu

rn to

flex

ion

of fo

rear

ms,

sc

arf s

ign,

pop

litea

l ang

le, f

oot

dors

iflex

ion,

rig

htin

g re

actio

n,

rais

e to

sit,

bac

k to

lyin

g, fi

nger

gr

asp

and

resp

onse

to tr

actio

n,

nonn

utri

tive

suck

ing,

cro

ssed

ex

tens

ion,

vis

ion

fix a

nd tr

ack

—BO

ECo

rrel

atio

n of

indi

vidu

al

sign

s in

man

uscr

ipt

Port

-Roy

al-B

aude

locq

ue

Hosp

ital;

Pari

s, F

ranc

e39

719

99

Fere

su e

t al22

22Ed

ema,

ski

n te

xtur

e, s

kin

colo

r, sk

in o

paci

ty, l

anug

o,

plan

tar

crea

ses,

nip

ple

form

atio

n, b

reas

t siz

e,

ear

form

, ear

firm

ness

, ge

nita

ls

Post

ure,

squ

are

win

dow

, do

rsifl

exio

n of

foot

, arm

rec

oil,

leg

reco

il, p

oplit

eal a

ngle

, he

el-to

-ear

, sca

rf s

ign,

hea

d la

g,

vent

ral s

uspe

nsio

n

Birt

h w

eigh

t (B

W)

LMP

Corr

elat

ion

of in

divi

dual

si

gns

in m

anus

crip

tM

ater

nity

uni

t, Ha

rare

Ce

ntra

l Hos

pita

l; Ha

rare

, Zi

mba

bwe

364

2002

Dubo

witz

et

al14

21Ed

ema,

ski

n te

xtur

e, s

kin

colo

r, sk

in o

paci

ty, l

anug

o,

plan

tar

crea

ses,

nip

ple

form

atio

n, b

reas

t siz

e,

ear

form

, ear

firm

ness

, ge

nita

ls

Post

ure,

squ

are

win

dow

, ank

le

dors

iflex

ion,

arm

rec

oil,

leg

reco

il, p

oplit

eal a

ngle

, hee

l to

ear,

scar

f sig

n, h

ead

lag,

ven

tral

su

spen

sion

—LM

P95

% C

I: ±

2.0

wk

NICU

, Jes

sop

Hosp

ital

for

Wom

en; S

heffi

eld,

En

glan

d

167

1970

Dubo

witz

and

Fa

rr (

from

Ni

colo

poul

os

et a

l23)

17Sk

in te

xtur

e, s

kin

colo

r, sk

in

opac

ity, l

anug

o, p

lant

ar

crea

ses,

nip

ple

form

atio

n,

brea

st s

ize,

ear

form

, ear

fir

mne

ss

Post

ure,

squ

are

win

dow

, do

rsifl

exio

n of

foot

, pop

litea

l an

gle,

hee

l-to-

ear,

scar

f sig

n,

head

lag,

ven

tral

sus

pens

ion

—LM

Pr

= 0.

878

Alex

andr

a M

ater

nity

Ho

spita

l; At

hens

, Gre

ece

710

1976

Finn

strö

m24

12Br

east

siz

e, n

ippl

e fo

rmat

ion,

sk

in o

paci

ty, s

calp

hai

r, ha

ir-fo

rehe

ad b

orde

r, ey

ebro

ws,

ear

car

tilag

e,

finge

rnai

ls, x

ipho

id

proc

ess,

ext

erna

l gen

italia

, pl

anta

r sk

in c

reas

es,

pupi

llary

mem

bran

e

——

LMP

r =

0.84

for

5 ex

tern

al

char

acte

rist

ics

Univ

ersi

ty H

ospi

tal;

Umea

, Sw

eden

174

1972

Balla

rd e

t al19

12Sk

in c

olor

, lan

ugo,

pla

ntar

cr

ease

s, b

reas

t siz

e, e

ar

firm

ness

, gen

itals

Post

ure,

squ

are

win

dow

(w

rist

), ar

m r

ecoi

l, po

plite

al a

ngle

, sca

rf

sign

, hee

l-to-

ear

—LM

P an

d cl

inic

al

data

r =

0.85

2NI

CU, C

inci

nnat

i Gen

eral

Ho

spita

l; Ci

ncin

nati,

Ohi

o25

219

79

Balla

rd e

t al

(Ne

w

Balla

rd

scor

e)20

12Sk

in, l

anug

o, p

lant

ar c

reas

e,

brea

st m

atur

ity, e

ye a

nd/

or e

ar, g

enita

ls

Post

ure,

squ

are

win

dow

(w

rist

), ar

m r

ecoi

l, po

plite

al a

ngle

, sca

rf

sign

, hee

l-to-

ear

—BO

Er

= 0.

97NI

CUs

and

nurs

erie

s;

Cinc

inna

ti, O

hio

530

1991

by guest on May 26, 2019www.aappublications.org/newsDownloaded from

Page 4: Diagnostic Accuracy of Neonatal Assessment for Gestational Age ... · with ultrasound, the Dubowitz score dated 95% of pregnancies within ±2.6 weeks (n = 7 studies), while the Ballard

LEE et al4

Lee et alDiagnostic Accuracy of Neonatal Assessment for Gestational Age Determination: A Systematic Review

2017https://doi.org/10.1542/peds.2017-1423

4Pediatrics

ROUGH GALLEY PROOFOctober 2017

140

Clin

ical

Sc

orin

g Sy

stem

or

Nam

e

No. o

f cr

iteri

aPh

ysic

al C

rite

ria

Neur

omus

cula

r Cr

iteri

aOt

her

Crite

ria

Refe

renc

e St

anda

rdOr

igin

al R

epor

ted

Accu

racy

or

Corr

elat

ion

with

GA

Stud

y Se

ttin

g an

d Lo

catio

nSa

mpl

e Si

zeYe

ar

Farr

2510

—Sp

onta

neou

s m

otor

act

ivity

, re

actio

n of

pup

ils to

ligh

t, ra

te

of s

ucki

ng, c

losu

re o

f mou

th

whe

n su

ckin

g, s

trip

ping

act

ion

of th

e to

ngue

, res

ista

nce

agai

nst

pass

ive

mov

emen

t, re

coil

of

fore

arm

s, p

lant

ar g

rasp

, pitc

h of

cr

y, in

tens

ity o

f cry

—LM

PAc

cura

te ±

1 w

k: 6

1%Ab

erde

en, S

cotla

nd82

1968

Tunç

er e

t al26

8Sk

in te

xtur

e, e

ar fo

rm,

firm

ness

, bre

ast s

ize

and

nipp

le fo

rmat

ion,

pla

ntar

cr

ease

s, fa

cial

app

eara

nce

Post

ure,

arm

rec

oil,

scar

f sig

n—

LMP

r =

0.94

5Ha

cett

epe

Univ

ersi

ty, N

ICU;

An

kara

, Tur

key

100

1981

Ereg

ie17

8Sk

in te

xtur

e, e

ar fo

rm, b

reas

t si

ze, g

enita

liaPo

stur

e, s

carf

sig

nHe

ad

circ

umfe

renc

e,

mid

-arm

ci

rcum

fere

nce

Dubo

witz

Accu

rate

±2

wk:

92%

Univ

ersi

ty te

achi

ng

hosp

itals

; Ben

in, N

iger

ia26

219

91

Capu

rro

et a

l157

Skin

text

ure,

nip

ple

form

atio

n, e

ar fo

rm,

brea

st s

ize,

pla

ntar

cr

ease

s

Scar

f sig

n, h

ead

lag

—LM

Pr

= 0.

9M

onte

vide

o, U

rugu

ay11

519

78

Kollé

e et

al27

7Sk

in c

olor

, ski

n te

xtur

e,

plan

tar

crea

ses,

bre

ast

size

, ear

firm

ness

, nai

l le

ngth

—AV

CLNS

95%

CI:

±19

.9 d

Ca

thol

ic U

nive

rsity

; Ni

jmeg

en, N

ethe

rlan

ds22

919

85

Klim

ek e

t al28

6La

nugo

, pla

ntar

cre

ases

, br

east

siz

ePo

stur

e, a

ngle

fore

arm

to a

rm,

pulli

ng a

n el

bow

to th

e bo

dy—

Balla

rdr

= 0.

72Te

rtia

ry c

are

hosp

itals

; Po

land

800

2000

Sim

plifi

ed

Dubo

witz

(f

rom

Alla

n et

al29

)

6Br

east

siz

e, s

kin

text

ure,

ear

be

ndin

g (s

ubst

itute

d fr

om

ear

firm

ness

bec

ause

so

me

Abor

igin

al b

abie

s ha

ve le

ss e

ar c

artil

age)

Squa

re w

indo

w, p

oplit

eal a

ngle

, sc

arf s

ign

—Ul

tras

ound

Mea

n di

ffere

nce:

0.4

wk

(95%

LOA

: −2.

8 to

1.9

)Pr

ivat

e Ho

spita

ls; N

orth

ern

Terr

itory

, Aus

tral

ia98

2009

Nara

yana

n

et a

l305

Skin

col

or, e

ar fo

rm, p

lant

ar

skin

cre

ase,

bre

ast

form

atio

n, s

kin

text

ure

—AV

CLLM

P95

% C

I: ±

11 d

Kala

wat

i Sar

an C

hild

ren’

s Ho

spita

l; Ne

w D

elhi

, In

dia

356

1982

Robi

nson

31 (

from

Se

rfon

tein

and

Ja

rosz

ewic

z32)

5—

Pupi

l rea

ctio

n, tr

actio

n, g

labe

llar

tap,

nec

k ri

ghtin

g, h

ead

turn

ing

—Du

bow

itz95

CI:

±1

wk;

r =

0.8

5So

uth

Afri

ca73

1966

Park

in e

t al16

4Sk

in te

xtur

e, b

reas

t siz

e,

edem

a, p

lant

ar s

kin

crea

ses,

nai

l len

gth,

nai

l te

xtur

e, e

ar fi

rmne

ss, s

kull

hard

ness

, lan

ugo

hair,

ge

nita

lia

——

LMP

95 C

I: ±

18.1

dUn

iver

sity

hos

pita

l; Ne

wca

stle

, Eng

land

392

1976

TABL

e 1

Cont

inue

d

by guest on May 26, 2019www.aappublications.org/newsDownloaded from

Page 5: Diagnostic Accuracy of Neonatal Assessment for Gestational Age ... · with ultrasound, the Dubowitz score dated 95% of pregnancies within ±2.6 weeks (n = 7 studies), while the Ballard

diabetic mothers), editorials or reviews without original data, individual case reports, and duplicate studies.

data extraction

All articles were reviewed independently by 2 researchers and extracted into a standard Excel file. Differences were resolved by a third independent reviewer. The study characteristics extracted are listed in Supplemental Information 2 .

study Quality Assessment

Two independent reviewers graded the methodological quality of the studies of diagnostic accuracy using the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS–2)36 tool, modified for the context of this review (Supplemental Information, “Study Quality Assessment” section). Individual studies were evaluated for limitations and biases in the following domains: patient selection, test method, reference standard, and patient flow and timing. Studies with a reference standard GA of ultrasonography or best obstetric estimate (BOE) (including ultrasound confirmation of dating) were graded as highest quality. Though LMP may be considered gold standard in high-resource settings (where rates of literacy and early antenatal care are high), in LMIC, LMP recall is considered less reliable because of low literacy rates and late presentation to antenatal care.11, 37 Additionally, we assessed the generalizability of study results to LMIC.

statistical Analysis

Stata 13 (StataCorp, College Station, TX) and R (R Foundation for Statistical Computing, Vienna, Austria) were used for analyses. The definition of preterm birth was a live birth <37 weeks’ gestation. Studies were grouped by method of newborn assessment and reference standard. Simple descriptive statistics were

used to report ranges and medians. The mean individual-level differences between 2 methods of GA assessment were pooled using the Stata metan command, which provided the pooled mean-difference estimate and 95% confidence interval (CI). The variance and SD around the pooled estimate were calculated using the following formula38:

Variance  pooled  =   

∑  i=1  k  ( n  i  − 1 ) S  i  2  _______________  ∑  i=1  k  ( n  i  − 1) 

For studies in which researchers reported the percent of test measures within ±1 to 2 weeks of a reference, percentages were logit transformed and SEs were calculated. Meta-analysis was conducted with a random effects model. The Higgins I2 statistic was calculated to assess heterogeneity. For reports of diagnostic accuracy, forest plots were generated in R to summarize diagnostic accuracy across studies. Because pooling of sensitivity and specificity separately fails to account for the interrelatedness of the measures, hierarchical bivariate models are recommended for meta-analysis.39 These were analyzed by using MetaDisc 1.4 and RStudio (Mada package). Hierarchal summary receiver operating characteristic curves were generated.

Subgroup analyses were conducted by assessment method, reference standard type, and country income level. Correlation coefficients were not pooled, given that in many studies type of coefficient (ie, Spearman or Pearson) was not indicated, and furthermore, methods for pooling Spearman correlation coefficients have not been well described.38

ResuLTs

Neonatal Clinical Assessments

We identified 3862 titles, and 66 articles were included, some

PEDIATRICS Volume 140, number 6, December 2017 5

Lee et alDiagnostic Accuracy of Neonatal Assessment for Gestational Age Determination: A Systematic Review

2017https://doi.org/10.1542/peds.2017-1423

4Pediatrics

ROUGH GALLEY PROOFOctober 2017

140

Clin

ical

Sc

orin

g Sy

stem

or

Nam

e

No. o

f cr

iteri

aPh

ysic

al C

rite

ria

Neur

omus

cula

r Cr

iteri

aOt

her

Crite

ria

Refe

renc

e St

anda

rdOr

igin

al R

epor

ted

Accu

racy

or

Corr

elat

ion

with

GA

Stud

y Se

ttin

g an

d Lo

catio

nSa

mpl

e Si

zeYe

ar

Bhag

wat

et

al18

(fr

om

Bind

usha

et

al33

)

4Sk

in te

xtur

e, b

reas

t siz

e, e

ar

firm

ness

, gen

italia

)—

—LM

PM

ean

diffe

renc

e: −

0.58

wk;

r

= 0.

91M

edic

al C

olle

ge;

Thir

uvan

anth

apur

am,

Kera

la, I

ndia

1000

; GA

28–3

7 w

k

2014

LOA,

lim

its o

f agr

eem

ent;

NS, n

ot s

tate

d; —

, not

app

licab

le.

TABL

e 1

Cont

inue

d

by guest on May 26, 2019www.aappublications.org/newsDownloaded from

Page 6: Diagnostic Accuracy of Neonatal Assessment for Gestational Age ... · with ultrasound, the Dubowitz score dated 95% of pregnancies within ±2.6 weeks (n = 7 studies), while the Ballard

reporting on more than one scoring system (22 articles reported on the Dubowitz score, 31 on the Original and/or New Ballard score, and 25 on other clinical scores)(Fig 1). Basic study characteristics of all included studies are in Supplemental Table 10. The studies were published between 1968 and 2016, with fewer than half from LMIC. Most studies (n = 62) were conducted in health facilities, with 19 conducted in NICUs

on preterm and/or low birth weight (LBW) populations. For the reference standard, there were 31 studies in which researchers had ultrasound-based dating, 42 in which they used LMP, and 3 in which researchers used dating based on another neonatal assessment.

The overall QUADAS–2 summary is in Supplemental Fig 6. In general, the quality of the studies was relatively

low. In over half of the studies, there was a high risk of bias related to patient selection, test method, or reference standard.

Neonatal Clinical Assessments or Scores

We identified 18 different neonatal assessments or scoring systems (combining >1 individual clinical sign) for GA determination (Table 1). Twelve were developed in high-income countries (HICs) and 7 in LMIC (4 in Africa, 2 in Asia, 1 in Turkey). The reference standard from which the scores were derived was ultrasound/BOE in only 2 studies. The most complex score, Amiel-Tison, 21 has 23 criteria, including a large number of neurologic signs. The simplest score, the Parkin, 16 includes only 4 external physical criteria. One simplified score was developed in Nigeria (Eregie17) and includes physical anthropometrics (head circumference and midarm circumference).

Individual External Physical Criteria and Signs

Table 2 shows 12 studies in which researchers reported the correlation of individual external physical criteria with GA. Correlation coefficients were generally higher for comparisons with an LMP reference, for which median correlation coefficients ranged from 0.60 to 0.75 for most signs. Three studies used an ultrasound or BOE GA reference, and lower correlations were reported in 2 of these studies, neither of which included early preterm infants.21, 40 The physical characteristics with the highest median correlation were breast size, plantar skin creases, ear firmness, and skin texture.

Individual Neuromuscular Signs

In 10 studies, researchers reported the correlation of individual neuromuscular criteria with GA (Table 2). The median correlation

LEE et al6

Lee et alDiagnostic Accuracy of Neonatal Assessment for Gestational Age Determination: A Systematic Review

2017https://doi.org/10.1542/peds.2017-1423

4Pediatrics

ROUGH GALLEY PROOFOctober 2017

140

FIGuRe 1Neonatal clinical assessment: flow diagram. Diagram of the screening process to identify studies for inclusion in neonatal assessment review; adapted from the PRISMA (Moher D, Liberati A, Tetzlaff J, Altman DG; PRISMA Group. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Med. 2009 Jul 21;6(7):e1000097). *Note: Several papers reported on>1 score.

by guest on May 26, 2019www.aappublications.org/newsDownloaded from

Page 7: Diagnostic Accuracy of Neonatal Assessment for Gestational Age ... · with ultrasound, the Dubowitz score dated 95% of pregnancies within ±2.6 weeks (n = 7 studies), while the Ballard

PEDIATRICS Volume 140, number 6, December 2017 7

Lee et alDiagnostic Accuracy of Neonatal Assessment for Gestational Age Determination: A Systematic Review

2017https://doi.org/10.1542/peds.2017-1423

4Pediatrics

ROUGH GALLEY PROOFOctober 2017

140

TABL

e 2

Corr

elat

ion

of In

divi

dual

Phy

sica

l or

Neur

omus

cula

r Cr

iteri

a W

ith G

A

Amie

l-Ti

son

et

al21

Lee

et a

l40Ba

llard

et

al (

New

Ba

llard

)20

Park

in e

t al

16Du

bow

itz

and

Farr

(N

icol

opou

los

et a

l23)

Ragh

u et

al

41Fe

resu

et

al22

Dubo

witz

an

d Fa

rr

(Sun

joh

et

al42

)

Finn

strö

m24

Balla

rd e

t al

19Tu

nçer

et

al26

Nara

yana

n et

al30

Sum

mar

y Ac

ross

Al

l Stu

dies

, M

edia

n (M

inim

um,

Max

imum

)

N (s

ampl

e si

ze)

397

710

530

392

710

160

364

358

174

252

220

356

Stud

y Se

ttin

g an

d Lo

catio

nTe

rtia

ry

Hosp

ital;

Pari

s,

Fran

ce

Com

mun

ity,

Sylh

et,

Bang

lade

sh

NICU

s an

d nu

rser

ies;

Ci

ncin

nati,

Oh

io

Univ

ersi

ty

Hosp

ital;

New

cast

le,

Engl

and

Mat

erni

ty

Hosp

ital;

Athe

ns,

Gree

ce

Univ

ersi

ty

hosp

ital;

Lusa

ka,

Zam

bia

Mat

erni

ty

unit;

Ha

rare

, Zi

mba

bwe

Tert

iary

Ho

spita

l; Ya

ound

e,

Cam

eroo

n

Univ

ersi

ty

hosp

ital;

Umea

, Sw

eden

NICU

an

dnur

sery

; Ci

ncin

nati,

Oh

io

NICU

; Ank

ara,

Tu

rkey

Child

ren’

s Ho

spita

l; Ne

w D

elhi

, In

dia

GA r

ange

incl

uded

37–4

1 w

k34

–42

wk

20–4

4 w

k25

.2–4

5.2

wk

28–4

4 w

kNS

24–4

5 w

k25

–44

wk

32.1

–34

wk

26–4

4 w

k,

760–

5460

g

27–4

1 w

k26

–44.

4 w

k—

Refe

renc

e st

anda

rdBO

EUl

tras

ound

BOE

LMP

LMP

LMP

LMP

LMP

LMP

LMP

LMP

LMP

—Ph

ysic

al c

rite

ria

Sk

in c

olor

0.19

0.05

—0.

780.

760.

520.

450.

80.

480.

84—

0.74

0.63

(0.

05, 0

.84)

Ea

r fo

rm0.

110.

020.

73—

0.76

0.64

0.57

0.72

0.41

0.84

0.62

—0.

63 (

0.02

, 0.8

4)

Ear

firm

ness

0.18

0.03

—0.

780.

760.

650.

530.

72—

——

0.85

0.69

(0.

03, 0

.85)

Pl

anta

r sk

in

crea

ses

0.34

0.02

0.72

0.76

0.77

0.56

0.64

0.76

0.65

0.79

0.64

0.87

0.69

, (0.

02, 0

.87)

Br

east

siz

e0.

25—

0.8

0.75

0.76

0.66

0.57

0.76

0.62

0.89

0.66

0.81

0.75

(0.

25, 0

.89)

Ni

pple

form

atio

n0.

190.

14—

—0.

720.

620.

550.

750.

68—

——

0.62

(0.

14, 0

.75)

Sk

in te

xtur

e0.

280.

140.

750.

720.

770.

590.

570.

8—

—0.

650.

770.

69 (

0.14

, 0.8

0)

Geni

talia

0.17

0.02

0.82

0.66

0.65

0.36

0.62

0.63

0.43

0.67

——

0.63

(0.

02, 0

.82)

La

nugo

hai

r0.

2−

0.01

0.81

0.62

0.73

0.55

0.49

0.71

—0.

77—

—0.

62 (

−0.

01, 0

.81)

Ed

ema

0.16

——

0.59

0.64

0.67

0.22

0.41

——

——

0.50

(0.

16, 0

.67)

Sk

in o

paci

ty0.

090.

02—

—0.

720.

220.

350.

70.

48—

——

0.35

(0.

02, 0

.72)

Na

il te

xtur

e—

——

0.57

——

——

——

——

0.57

(0.

57, 0

.57)

Na

il le

ngth

——

—0.

51—

——

——

——

—0.

51 (

0.51

, 0.5

1)

Faci

al

appe

aran

ce—

——

——

——

——

—0.

77—

0.77

(0.

77, 0

.77)

Sk

ull h

ardn

ess

0.15

——

——

——

——

——

—0.

15 (

0.15

, 0.1

5)Ne

urom

uscu

lar

crite

ria

——

——

——

——

——

——

Po

stur

e—

0.12

0.82

0.75

0.72

0.31

0.65

0.76

—0.

690.

48—

0.69

(0.

12, 0

.82)

Sq

uare

win

dow

——

0.79

0.21

0.73

0.58

0.64

0.69

—0.

7—

0.69

(0.

21, 0

.79)

Sc

arf s

ign

0.23

0.08

0.82

0.67

0.72

0.51

0.63

0.72

—0.

710.

41—

0.65

(0.

08, 0

.82)

Po

plite

al a

ngle

0.23

0.05

0.74

0.48

0.76

0.39

0.63

0.7

—0.

77—

—0.

63 (

0.05

, 0.7

7)

Arm

rec

oil

0.19

0.07

0.71

0.62

0.65

0.29

0.55

0.56

—0.

610.

36—

0.56

(0.

07, 0

.71)

He

el to

ear

—0.

040.

810.

510.

760.

50.

590.

66—

0.72

——

0.63

(0.

04, 0

.81)

Le

g re

coil

——

—0.

590.

550.

30.

470.

52—

——

—0.

52 (

0.30

, 0.5

9)

Vent

ral

susp

ensi

on—

——

0.59

0.72

0.42

0.7

0.71

——

——

0.70

(0.

42, 0

.72)

He

ad la

g—

——

0.47

0.71

0.36

0.59

0.65

——

——

0.59

(0.

36, 0

.71)

An

kle

dors

iflex

ion

0.21

——

0.37

0.74

0.47

0.59

0.66

——

——

0.53

(0.

21, 0

.74)

No

nnut

ritiv

e su

ckin

g re

flex

0.24

——

——

——

——

——

—0.

24 (

0.24

, 0.2

4)

Cr

osse

d ex

tens

ion

0.16

——

——

——

——

——

—0.

16 (

0.16

, 0.1

6)

by guest on May 26, 2019www.aappublications.org/newsDownloaded from

Page 8: Diagnostic Accuracy of Neonatal Assessment for Gestational Age ... · with ultrasound, the Dubowitz score dated 95% of pregnancies within ±2.6 weeks (n = 7 studies), while the Ballard

coefficients ranged from 0.52 to 0.70 in the studies using an LMP reference standard GA. Of the 3 studies that used an ultrasound-based reference standard GA, correlation coefficients were again lower in the same 2 studies as they were for physical criteria.21, 40 The signs with the highest median correlation coefficients were ventral suspension, square window, and posture.

Validity of Neonatal Clinical Scores of GA

Studies in which researchers reported on the validity or agreement of neonatal assessments with a reference standard are shown in Table 3 (Dubowitz), Table 4 (Ballard), and Supplemental Table 12 (other assessments).

Dubowitz Score

There were 26 studies in which researchers validated the Dubowitz score (11 ultrasound/BOE; 19 LMP reference). Ten studies were from LMIC. In most studies, the neonatal assessment was performed by physicians or nurses.

Ultrasound or BOE Reference Standard

In 2 studies, researchers reported the correlation of GA dating by Dubowitz score and BOE (r = 0.73 and 0.90, respectively). In 7 studies, researchers reported a mean difference in GA between Dubowitz and ultrasound-based dating, ranging from −2.2 weeks (underestimation) to +0.7 weeks (overestimation). The pooled mean difference was not statistically different from the null hypothesis (ie, difference = 0), indicating no evidence of overall systematic bias (Table 5, Supplemental Fig 7). The precision of the estimate is reflected in the SD of the mean difference, which, at the individual study level, ranged from 0.52 to 1.94 weeks. The pooled SD across the studies was 1.3 weeks, indicating that 95% of the differences in GA (Dubowitz score–ultrasound

dating) fell within ±2.6 weeks (n = 7 studies). In the studies in which researchers reported on the percent agreement within weeks (n = 3), the Dubowitz GA fell within 1 week of ultrasound dates in 53% of infants (pooled estimate, 95% CI: 47% to 71%), and within 2 weeks in 59% of newborns (pooled estimate, 95% CI: 41% to 74%). Researchers in 1 study reported on the diagnostic accuracy of the Dubowitz score to identify preterm infants compared to ultrasound-based dating (sensitivity 61%, specificity 99%).50 Among studies done in LMIC, there was no significant bias compared with ultrasound dating, and the precision of GA dating by the Dubowitz score was similar to HICs (Supplemental Table 11).

In 4 studies, there was evidence of greater bias of Dubowitz scoring among preterm infants (Supplemental Table 12). In 4 studies, researchers reported that the Dubowitz score systematically overestimated GA in preterm infants by up to 2.6 weeks48 – 50 and more so among early preterm infants.46, 48 – 50

LMP Reference Standard

The correlation of GA determined by Dubowitz scoring and LMP GA was reported in 14 studies and was generally high, ranging from 0.41 to 0.94 (median = 0.89). The pooled mean difference was 0.65 weeks (n = 6, 95% CI: 0.01 to 1.30), in dicating a systematic overestimation com-pared with LMP-based GA (Table 5, Supplemental Fig 7). 95% of the differences fell within ±2.9 weeks of the mean. The GA determined by Dubowitz assessment fell within 1 week of LMP dates in 59% of newborns (n = 4, 95% CI: 41% to 74%) and within 2 weeks in 87% (n = 6, 95% CI: 71% to 95%). Researchers in 1 study reported on the diagnostic accuracy of the Dubowitz score to identify preterm infants (sensitivity 81.5%, specificity 98.6%).41 Among LMIC studies (n = 2), there was a

LEE et al8

Lee et alDiagnostic Accuracy of Neonatal Assessment for Gestational Age Determination: A Systematic Review

2017https://doi.org/10.1542/peds.2017-1423

4Pediatrics

ROUGH GALLEY PROOFOctober 2017

140

Amie

l-Ti

son

et

al21

Lee

et a

l40Ba

llard

et

al (

New

Ba

llard

)20

Park

in e

t al

16Du

bow

itz

and

Farr

(N

icol

opou

los

et a

l23)

Ragh

u et

al

41Fe

resu

et

al22

Dubo

witz

an

d Fa

rr

(Sun

joh

et

al42

)

Finn

strö

m24

Balla

rd e

t al

19Tu

nçer

et

al26

Nara

yana

n et

al30

Sum

mar

y Ac

ross

Al

l Stu

dies

, M

edia

n (M

inim

um,

Max

imum

)

Visi

on: fi

x an

d tr

ack

0.1

——

——

——

——

——

—0.

10 (

0.10

, 0.1

0)

Ri

ghtin

g re

actio

n0.

07—

——

——

——

——

——

0.07

(0.

07, 0

.07)

Ra

ise

to s

it0.

15—

——

——

——

——

——

0.15

(0.

15, 0

.15)

Ba

ck to

lyin

g0.

03—

——

——

——

——

——

0.03

(0.

03, 0

.03)

Fi

nger

gra

sp a

nd

resp

onse

to

trac

tion

0.11

——

——

——

——

——

—0.

11 (

0.11

, 0.1

1)

NS, n

ot s

tate

d; —

, not

app

licab

le.

TABL

e 2

Cont

inue

d

by guest on May 26, 2019www.aappublications.org/newsDownloaded from

Page 9: Diagnostic Accuracy of Neonatal Assessment for Gestational Age ... · with ultrasound, the Dubowitz score dated 95% of pregnancies within ±2.6 weeks (n = 7 studies), while the Ballard

PEDIATRICS Volume 140, number 6, December 2017 9

Lee et alDiagnostic Accuracy of Neonatal Assessment for Gestational Age Determination: A Systematic Review

2017https://doi.org/10.1542/peds.2017-1423

4Pediatrics

ROUGH GALLEY PROOFOctober 2017

140

TABL

e 3

Agre

emen

t and

Val

idity

of t

he D

ubow

itz S

core

Auth

orYe

arSt

udy

Sett

ing

and

Loca

tion

GA o

f Coh

ort

Sam

ple

Size

Asse

ssm

ent

Vers

ion

(Tot

al,

Phys

ical

or

Exte

rnal

, Ne

urol

ogic

)

Agre

emen

tVa

lidity

Corr

elat

ion

Coef

ficie

nt

(R)

With

Re

fere

nce

GA

Mea

n Di

ffere

nce

(wk)

SD o

f the

M

ean

Diffe

renc

e (w

k)

Blan

d-Al

tman

95

% L

OA

±1.

96 S

D (L

L, U

L)

(wk)

Perc

ent

With

in 1

w

k

Perc

ent

With

in 2

w

k

Sens

itivi

ty

Pret

erm

<3

7 w

k (%

) (9

5%

CI)

Spec

ifici

ty

Pret

erm

<3

7 w

k (%

) (9

5%

CI)

<37

wk

PPV

<37

wk

NPV

Ultr

asou

nd

HICs

Alla

n et

al29

2009

Tert

iary

Hos

pita

ls;

Nort

hern

Ter

rito

ry,

Aust

ralia

29.6

–41.

7 w

k98

Tota

l—

0.10

1.10

(−2.

3, 2

.0)

——

——

——

Robe

rts

et

al43

1979

Univ

ersi

ty H

ospi

tal;

Card

iff,

Wal

esNS

118

Tota

l—

——

—68

.689

.8—

——

Vik

et a

l4419

97Tr

ondh

eim

and

Ber

gen,

No

rway

All G

As97

0To

tal

—−

0.20

1.12

(−2.

3, 2

.1)

——

——

——

Awou

st a

nd

Levi

4519

82Un

iver

sity

Hos

pita

l; Br

usse

ls, B

elgi

umNS

130

Tota

l—

0.50

1.04

——

——

——

Sand

ers

et

al46

1991

NICU

; Bal

timor

e, M

D<1

500

g, >

20

wk

110

Tota

l0.

733.

00—

—18

.239

.1—

——

War

iyar

et

al47

1997

New

cast

le, U

K32

–42

wk

347

Tota

l—

0.71

1.17

(−1.

57, 3

.0)

——

——

——

<30

wk

105

Tota

l—

2.86

2.48

(−2.

0, 7

.71)

Robi

llard

et

al48

1992

NICU

; Gua

dalu

pe, F

renc

h W

est I

ndie

s<2

500

g38

4To

tal

—0.

641.

94—

61.0

82.0

——

——

Shuk

la

et a

l4919

87Un

iver

sity

hos

pita

ls; N

ew

York

, NY

Pret

erm

<38

w

k, A

GA25

Tota

l0.

90—

——

—48

.0—

——

L

MIC

Moo

re

et a

l5020

15Re

fuge

e cl

inic

s; T

hai-

Mya

nmar

bor

der

All G

As25

0To

tal

—2.

57a

1.04

a(0

.49,

4.

65)a

——

6199

——

Rose

nber

g

et a

l5120

09Sp

ecia

l Car

e Nu

rser

y;

Dhak

a, B

angl

ades

h≤

33 w

k35

5To

tal

—0.

560.

52(−

1.57

, 0.

47)

——

——

——

Karu

nase

kera

et

al52

2002

Teac

hing

Hos

pita

l; Ra

gam

a,

Sri L

anka

35–4

2 w

k20

0To

tal

—−

2.18

1.43

——

——

——

—Ex

tern

al—

−0.

452.

39—

——

——

——

LMP

HI

Cs

Ba

llard

et

al19

1979

NICU

and

nur

sery

; Ci

ncin

nati,

OH

NS22

4To

tal

0.85

——

——

——

——

Capu

rro

et

al15

1978

Tert

iary

car

e ce

nter

; M

onte

vide

o, U

rugu

ayNS

115

Tota

l0.

91—

——

——

——

——

Mitc

hell53

1979

New

born

nur

sery

, ; Lo

ndon

, En

glan

dNS

20To

tal

0.41

——

——

——

——

Nico

lopo

ulos

et

al23

1976

Mat

erni

ty H

ospi

tal a

nd

clin

ics;

Ath

ens,

Gre

ece

28–4

4 w

k71

0To

tal

0.91

——

——

——

——

—Ex

tern

al0.

88—

——

——

——

——

Corr

ecte

d ne

urol

ogic

0.85

——

——

——

——

Robe

rts

et

al43

1979

Univ

ersi

ty H

ospi

tal;

Card

iff,

Wal

esNS

118

Tota

l—

——

—67

.879

.6—

——

by guest on May 26, 2019www.aappublications.org/newsDownloaded from

Page 10: Diagnostic Accuracy of Neonatal Assessment for Gestational Age ... · with ultrasound, the Dubowitz score dated 95% of pregnancies within ±2.6 weeks (n = 7 studies), while the Ballard

LEE et al10

Lee et alDiagnostic Accuracy of Neonatal Assessment for Gestational Age Determination: A Systematic Review

2017https://doi.org/10.1542/peds.2017-1423

4Pediatrics

ROUGH GALLEY PROOFOctober 2017

140

Auth

orYe

arSt

udy

Sett

ing

and

Loca

tion

GA o

f Coh

ort

Sam

ple

Size

Asse

ssm

ent

Vers

ion

(Tot

al,

Phys

ical

or

Exte

rnal

, Ne

urol

ogic

)

Agre

emen

tVa

lidity

Corr

elat

ion

Coef

ficie

nt

(R)

With

Re

fere

nce

GA

Mea

n Di

ffere

nce

(wk)

SD o

f the

M

ean

Diffe

renc

e (w

k)

Blan

d-Al

tman

95

% L

OA

±1.

96 S

D (L

L, U

L)

(wk)

Perc

ent

With

in 1

w

k

Perc

ent

With

in 2

w

k

Sens

itivi

ty

Pret

erm

<3

7 w

k (%

) (9

5%

CI)

Spec

ifici

ty

Pret

erm

<3

7 w

k (%

) (9

5%

CI)

<37

wk

PPV

<37

wk

NPV

Vogt

et a

l5419

81Te

rtia

ry c

are

cent

er;

Norw

ayAl

l GAs

242

Tota

l—

——

——

90b

——

——

Vik

et a

l4419

97Tr

ondh

eim

and

Ber

gen,

No

rway

All G

As97

0To

tal

—−

0.40

1.43

(−3.

2, 2

.4)

——

——

——

Latis

et a

l5519

81Ne

onat

al u

nit;

Mila

no, I

taly

27–4

2 w

k92

Tota

l—

0.44

1.62

——

80.7

——

——

Dubo

witz

et

al14

1970

New

born

and

Spe

cial

Car

e Nu

rser

ies;

She

ffiel

d,

Engl

and

All G

As16

7To

tal

0.93

——

——

95.0

——

——

Exte

rnal

0.91

——

——

——

——

—Ne

urol

ogic

0.89

——

——

——

——

Al

lan

et a

l2920

09Te

rtia

ry H

ospi

tals

; No

rthe

rn T

erri

tory

, Au

stra

lia

29.6

–41.

7 w

k56

Tota

l—

0.30

0.92

(−1.

5, 2

.1)

——

——

——

Hert

z

et a

l5619

78Ge

nera

l Hos

pita

l; Cl

evel

and,

OH

All G

As12

6To

tal

0.86

——

——

——

——

Sand

ers

et

al46

1991

NICU

; Bal

timor

e, M

D<1

500

g, >

20

wk

110

Tota

l0.

682.

802.

1—

23.6

46.3

——

——

LM

IC

Fe

resu

et

al22

2002

Mat

erni

ty u

nit;

Hara

re,

Zim

babw

eAl

l GAs

364

Tota

l0.

81—

——

——

——

——

Exte

rnal

0.77

——

——

——

——

—Ne

urol

ogic

0.79

——

——

——

——

Su

njoh

et

al42

2004

Tert

iary

Hos

pita

ls;

Yaou

nde,

Cam

eroo

n25

–44

wk

358

Tota

l0.

940.

501.

31—

—93

.0—

——

Tunç

er

et a

l2619

81Un

iver

sity

Hos

pita

l; An

kara

, Tu

rkey

27–4

1 w

k12

0To

tal

0.88

——

——

——

——

Cevi

t et a

l5719

98Te

rtia

ry c

are

cent

er; S

ivas

, Tu

rkey

28–3

8 w

k;

<250

0 g

91To

tal

0.85

0.30

——

60.4

98.9

——

——

Jaro

szew

icz

and

Boyd

5819

73Te

rtia

r Ho

spita

l; Ca

pe

Tow

n, S

outh

Afr

ica

NS10

0To

tal

0.9

——

——

——

——

Daw

odu

et

al59

1977

Mat

erni

ty u

nits

; Iba

dan,

Ni

geri

a29

–43

wk

100

Tota

l0.

900.

381.

41(−

2.39

, 3.

15)

74.0

94.0

81.5

98.6

95.7

93.5

Ragh

u

et a

l4119

81Pr

emat

ure

unit,

Uni

vers

ity

Hosp

ital;

Lusa

ka, Z

ambi

aNS

160

Tota

l0.

90—

——

——

——

——

Exte

rnal

0.82

Neur

olog

ic0.

80

LL, l

ower

lim

it; L

OA, l

imits

of a

gree

men

t; NP

V, n

egat

ive

pred

ictiv

e va

lue;

NS,

not

sta

ted;

PPV

, pos

itive

pre

dict

ive

valu

e; U

L, u

pper

lim

it; —

, ind

icat

es th

at th

e da

ta w

ere

not a

vaila

ble

for

that

pap

er.

a Fo

r a

34-w

k ne

wbo

rn w

ith a

wei

ght-f

or-a

ge z

sco

re o

f 0. T

here

was

evi

denc

e of

a s

igni

fican

t tre

nd a

cros

s GA

; mea

n bi

as d

ecre

ased

by

0.35

wk

per

wee

k in

crea

se in

new

born

GA.

b Pe

rcen

t with

in ±

3 w

k of

LM

P (r

efer

ence

) GA

.

TABL

e 3

Cont

inue

d

by guest on May 26, 2019www.aappublications.org/newsDownloaded from

Page 11: Diagnostic Accuracy of Neonatal Assessment for Gestational Age ... · with ultrasound, the Dubowitz score dated 95% of pregnancies within ±2.6 weeks (n = 7 studies), while the Ballard

PEDIATRICS Volume 140, number 6, December 2017 11

Lee et alDiagnostic Accuracy of Neonatal Assessment for Gestational Age Determination: A Systematic Review

2017https://doi.org/10.1542/peds.2017-1423

4Pediatrics

ROUGH GALLEY PROOFOctober 2017

140

TABL

e 4

Agre

emen

t and

Val

idity

of t

he B

alla

rd S

core

Stud

yYe

arSt

udy

Sett

ing

and

Loca

tion

GA o

f Co

hort

Sam

ple

Size

Asse

ssm

ent

Vers

ion

(Ori

gina

l or

New

Bal

lard

)

Agre

emen

tVa

lidity

Corr

elat

ion

Coef

ficie

nt

(R)

with

Re

fere

nce

GA

Mea

n Di

ffere

nce

(wk)

SD o

f Mea

n Di

ffere

nce

(wk)

Blan

d-Al

tman

95

% L

OA

±1.

96 S

D (L

L, U

L)

(wk)

Perc

ent

With

in 1

w

k

Perc

ent

With

in 2

w

k

Sens

itivi

ty

Pret

erm

<3

7 w

k (%

) (9

5% C

I)

Spec

ifici

ty

Pret

erm

<3

7 w

k (%

) (9

5% C

I)

<37

wk

PPV

<37

wk

NPV

Ultr

asou

nd

HICs

Sche

r an

d Ba

rmad

a6019

87NI

CU;

Pitt

sbur

gh,

PA

23–3

0 w

k by

LM

Pa24

Orig

inal

Ba

llard

—1.

352.

62(−

3.79

, 6.

49)

56.5

69.6

——

——

Alex

ande

r

et a

l6119

92Un

iver

sity

Ho

spita

l; Ch

arle

ston

, SC

28–4

4 w

k by

Ba

llard

4193

Balla

rd0.

79—

——

——

72.2

97.1

83.2

94.6

Sand

ers

et

al46

1991

NICU

; Ba

ltim

ore,

M

D

<150

0 g;

<3

7 w

k11

0Ba

llard

0.69

2.70

——

22.7

45.4

——

——

Smith

et a

l6219

99Un

iver

sity

Ho

spita

l; Ho

usto

n, T

X

<250

0 g;

85%

pr

eter

m82

Balla

rd0.

861.

401.

15—

—85

——

——

Dom

brow

ski

et a

l6319

92W

omen

's

Hosp

ital;

Detr

oit,

MI

24–4

3 w

k38

318

Balla

rd—

——

——

85.4

——

——

Gagl

iard

i et

al64

1992

NICU

s, te

rtia

ry

hosp

itals

; M

ilano

, Ita

ly

<37

wk;

<2

500

g22

7Ba

llard

—−

0.21

1.76

—20

.540

.4—

——

War

iyar

et a

l4719

97Ne

wca

stle

, UK

32–4

2 w

k34

7Ba

llard

—0.

571.

31(−

2.0,

3.1

4)—

——

——

—<3

0 w

k10

5Ba

llard

—3.

431.

97(−

0.43

, 7.

29)

<30

wk

105

New

Bal

lard

—1.

571.

75(−

1.86

, 5.0

)

Ba

llard

et a

l2019

91NI

CUs

and

nurs

erie

s;

Cinc

inna

ti,

OH

All G

As;

20−

44

wk

530

New

Bal

lard

0.97

0.15

1.46

——

——

——

Amat

o et

al65

1991

NICU

; Ber

n,

Switz

erla

ndAl

l pre

term

, LB

W38

Balla

rd

(phy

sica

l)—

——

——

——

——

L

MIC

Karl

et a

l6620

15He

alth

fa

cilit

ies;

M

adan

g,

Papu

a Ne

w

Guin

ea

25.5

–43.

7 w

k;

900–

4250

g

623

Balla

rd0.

350.

862.

41(−

3.86

, 5.

57)

——

39.0

92.0

21.0

97.0

668

(Ext

erna

l)0.

33—

——

——

58.0

81.0

14.0

97.0

by guest on May 26, 2019www.aappublications.org/newsDownloaded from

Page 12: Diagnostic Accuracy of Neonatal Assessment for Gestational Age ... · with ultrasound, the Dubowitz score dated 95% of pregnancies within ±2.6 weeks (n = 7 studies), while the Ballard

LEE et al12

Lee et alDiagnostic Accuracy of Neonatal Assessment for Gestational Age Determination: A Systematic Review

2017https://doi.org/10.1542/peds.2017-1423

4Pediatrics

ROUGH GALLEY PROOFOctober 2017

140

Stud

yYe

arSt

udy

Sett

ing

and

Loca

tion

GA o

f Co

hort

Sam

ple

Size

Asse

ssm

ent

Vers

ion

(Ori

gina

l or

New

Bal

lard

)

Agre

emen

tVa

lidity

Corr

elat

ion

Coef

ficie

nt

(R)

with

Re

fere

nce

GA

Mea

n Di

ffere

nce

(wk)

SD o

f Mea

n Di

ffere

nce

(wk)

Blan

d-Al

tman

95

% L

OA

±1.

96 S

D (L

L, U

L)

(wk)

Perc

ent

With

in 1

w

k

Perc

ent

With

in 2

w

k

Sens

itivi

ty

Pret

erm

<3

7 w

k (%

) (9

5% C

I)

Spec

ifici

ty

Pret

erm

<3

7 w

k (%

) (9

5% C

I)

<37

wk

PPV

<37

wk

NPV

668

(Neu

rolo

gic)

0.39

——

(−3.

57,

6.57

)—

—23

.093

.014

.096

.0

Rose

nber

g

et a

l5120

09Sp

ecia

l Car

e Nu

rser

y;

Dhak

a,

Bang

lade

sh

Pret

erm

, al

l <33

w

k

355

Balla

rd—

−0.

411.

08(−

0.7,

1.5

1)—

——

——

Lee

et a

l4020

16Co

mm

unity

; Sy

lhet

, Ba

ngla

desh

33–4

5 w

k71

0Ba

llard

0.12

−0.

402.

22(−

4.7,

4.0

)32

.064

15.0

87.0

9.0

92.0

Mor

aes

and

Reic

henh

eim

67

(tra

nsla

ted)

2000

Mat

erni

ty

unit;

Rio

de

Jan

eiro

, Br

azil

NS11

6Ne

w B

alla

rd—

——

——

—57

.0 (

41.0

to

73.

0)97

.0 (

90.0

to

99.

0)—

Sree

kum

ar

et a

l6820

13NI

CU a

nd

post

nata

l w

ards

; Be

ngal

uru,

In

dia

24–4

1.2

wk

284

New

Bal

lard

—−

0.04

——

——

——

——

Wyl

ie e

t al69

2013

Ndir

ande

Cl

inic

; Bl

anty

re,

Mal

awi

All G

As17

7Ne

w B

alla

rd—

0.80

2.19

(−3.

5, 5

.1)

——

——

——

Tayl

or e

t al70

2010

Com

mun

ity;

Kene

ba,

The

Gam

bia

All G

As80

Balla

rd

(ext

erna

l)—

−2.

231.

56(−

5.3,

0.8

2)—

——

——

Thi e

t al71

2015

Gene

ral

Hosp

ital;

Hoa

Binh

, Vi

etna

m

30–4

2 w

k by

ul

tras

ound

391

New

Bal

lard

0.90

——

——

——

——

LMP

HI

Cs

Ba

uman

n et

al72

(t

rans

late

d)19

93Un

iver

sity

Ho

spita

l; Be

rn,

Switz

erla

nd

27–3

5 w

k AG

A60

Balla

rd (

tota

l)0.

91—

——

——

——

——

28–3

6 w

k SG

A29

0.66

——

——

——

——

27–3

5 w

k AG

A60

Balla

rd

(ext

erna

l)0.

83—

——

——

——

——

TABL

e 4

Cont

inue

d

by guest on May 26, 2019www.aappublications.org/newsDownloaded from

Page 13: Diagnostic Accuracy of Neonatal Assessment for Gestational Age ... · with ultrasound, the Dubowitz score dated 95% of pregnancies within ±2.6 weeks (n = 7 studies), while the Ballard

PEDIATRICS Volume 140, number 6, December 2017 13

Lee et alDiagnostic Accuracy of Neonatal Assessment for Gestational Age Determination: A Systematic Review

2017https://doi.org/10.1542/peds.2017-1423

4Pediatrics

ROUGH GALLEY PROOFOctober 2017

140

Stud

yYe

arSt

udy

Sett

ing

and

Loca

tion

GA o

f Co

hort

Sam

ple

Size

Asse

ssm

ent

Vers

ion

(Ori

gina

l or

New

Bal

lard

)

Agre

emen

tVa

lidity

Corr

elat

ion

Coef

ficie

nt

(R)

with

Re

fere

nce

GA

Mea

n Di

ffere

nce

(wk)

SD o

f Mea

n Di

ffere

nce

(wk)

Blan

d-Al

tman

95

% L

OA

±1.

96 S

D (L

L, U

L)

(wk)

Perc

ent

With

in 1

w

k

Perc

ent

With

in 2

w

k

Sens

itivi

ty

Pret

erm

<3

7 w

k (%

) (9

5% C

I)

Spec

ifici

ty

Pret

erm

<3

7 w

k (%

) (9

5% C

I)

<37

wk

PPV

<37

wk

NPV

28–3

6 w

k SG

A29

0.66

——

——

——

——

27–3

5 w

k AG

A60

Balla

rd

(Neu

rolo

gic)

0.65

——

——

——

——

28–3

6 w

k SG

A29

0.66

——

——

——

——

Cons

tant

ine

et

al73

1987

AK, N

Y, M

A,

FL, P

A, T

X,

WA,

CN

All G

As12

46Ba

llard

0.81

0.60

2.18

——

—85

8189

75

(Phy

sica

l)0.

83−

0.1

2.14

——

—92

7487

87(N

euro

logi

c)0.

711.

42.

72—

——

7084

8960

Sche

r an

d Ba

rmad

a6019

87NI

CU; P

ittsb

urgh

, PA

23–3

0 w

k by

LM

P24

Balla

rd—

1.42

2.32

(−3.

13,

5.96

)45

.862

.5—

——

Mac

kanj

ee

et a

l7419

96NI

CU; L

ondo

n,

Onta

rio,

Ca

nada

23–3

3 w

k;

<150

0 g

47Ba

llard

0.87

1.50

1.50

——

——

——

Dom

brow

ski

et a

l6319

92W

omen

's

Hosp

ital;

Detr

oit,

MI

24–4

6 w

k38

818

Balla

rd—

——

——

69.9

——

——

Alex

ande

r

et a

l7519

90Un

iver

ity

Hosp

ital;

Char

lest

on,

SC

20–4

5 w

k10

794

Balla

rd0.

760.

48—

—52

.780

.3—

——

Balla

rd e

t al19

1979

NICU

and

nu

rser

y;

Cinc

inna

ti OH

26–4

4 w

k,

760–

5460

g

224

Balla

rd0.

85—

——

——

——

——

Alex

ande

r

et a

l7619

92Un

iver

sity

Ho

spita

l; Ch

arle

ston

, SC

28–4

4 w

k;

all A

fric

an

Amer

ican

po

pula

tion

3480

Balla

rd0.

820.

53—

——

68.2

——

——

28–4

4 w

k;

all w

hite

po

pula

tion

2091

Balla

rd0.

860.

17—

——

70.6

——

——

Balla

rd e

t al20

1991

NICU

s an

d nu

rser

ies;

Ci

ncin

nati,

OH

20–4

4 w

k57

8Ne

w B

alla

rd0.

96—

——

—88

.0—

——

TABL

e 4

Cont

inue

d

by guest on May 26, 2019www.aappublications.org/newsDownloaded from

Page 14: Diagnostic Accuracy of Neonatal Assessment for Gestational Age ... · with ultrasound, the Dubowitz score dated 95% of pregnancies within ±2.6 weeks (n = 7 studies), while the Ballard

LEE et al14

Lee et alDiagnostic Accuracy of Neonatal Assessment for Gestational Age Determination: A Systematic Review

2017https://doi.org/10.1542/peds.2017-1423

4Pediatrics

ROUGH GALLEY PROOFOctober 2017

140

Stud

yYe

arSt

udy

Sett

ing

and

Loca

tion

GA o

f Co

hort

Sam

ple

Size

Asse

ssm

ent

Vers

ion

(Ori

gina

l or

New

Bal

lard

)

Agre

emen

tVa

lidity

Corr

elat

ion

Coef

ficie

nt

(R)

with

Re

fere

nce

GA

Mea

n Di

ffere

nce

(wk)

SD o

f Mea

n Di

ffere

nce

(wk)

Blan

d-Al

tman

95

% L

OA

±1.

96 S

D (L

L, U

L)

(wk)

Perc

ent

With

in 1

w

k

Perc

ent

With

in 2

w

k

Sens

itivi

ty

Pret

erm

<3

7 w

k (%

) (9

5% C

I)

Spec

ifici

ty

Pret

erm

<3

7 w

k (%

) (9

5% C

I)

<37

wk

PPV

<37

wk

NPV

Ahn77

2008

Neon

atal

uni

ts,

univ

ersi

ty

hosp

ital;

Inch

eon,

So

uth

Kore

a

All G

A,

773–

4870

g

213

New

Bal

lard

b0.

850.

46c

——

——

——

——

Sand

ers

et

al46

1991

NICU

; Bal

timor

e,

MD

<150

0 g;

<3

7 w

k11

0Ba

llard

0.66

2.60

2.2

—28

.251

.0—

——

LM

IC

Ce

vit e

t al57

1998

Tert

iary

car

e ce

nter

; Siv

as,

Turk

ey

Pret

erm

28

–38

wk;

<2

500

g

91Ba

llard

—0.

10—

—59

.398

.9—

——

Fere

su e

t al22

2002

Mat

erni

ty u

nit;

Hara

re,

Zim

babw

e

24–4

5 w

k36

4Ba

llard

0.80

——

——

——

——

(Phy

sica

l)d

0.75

——

——

——

——

—(N

euro

logi

c)d

0.74

——

——

——

——

Su

njoh

et a

l4220

04Te

rtia

ry H

ospi

tals

; Ya

ound

e,

Cam

eroo

n

25–4

4 w

k35

8Ne

w B

alla

rd0.

930.

341.

52—

—86

.0—

——

Bind

usha

et

al33

2014

Tert

iary

hos

pita

l; Ke

rela

, Ind

ia28

–37

wk

1000

New

Bal

lard

0.92

0.31

——

——

<36

wk:

85

.6<3

6 w

k:

94.6

<36

wk:

98

.0<3

6 w

k:

53.6

Sasi

dhar

an

et a

l7820

09NI

CU; N

orth

ern

Indi

a29

–35

wk

129

New

Bal

lard

——

——

—10

0.0

——

——

Mor

aes

and

Reic

henh

eim

67

(tra

nsla

ted)

2000

Mat

erni

ty

unit;

Rio

de

Jane

iro,

Bra

zil

NS14

0Ne

w B

alla

rd—

——

——

—68

.0 (

49.0

to

82.

0)92

.0 (

85.0

to

96.

0)—

Thi e

t al71

2015

Gene

ral H

ospi

tal;

Hoa

Binh

, Vi

etna

m

30–4

3 w

k by

LM

P28

2Ne

w B

alla

rd0.

81—

——

——

——

——

Tayl

or e

t al70

2010

Com

mun

ity;

Kene

ba, T

he

Gam

bia

All G

As76

Balla

rd

(ext

erna

l)—

−2.

23.

3(−

8.67

, 4.

27)

——

——

——

Verh

oeff

et

al79

1997

Tert

iary

Hos

pita

ls;

Chik

waw

a Di

stri

ct, M

alaw

i

All G

As;

liter

ate

mot

hers

76Ba

llard

(e

xter

nal)

—0.

87—

——

——

——

LL, l

ower

lim

it; L

OA, l

imits

of a

gree

men

t; NP

V, n

egat

ive

pred

ictiv

e va

lue;

NS,

not

sta

ted;

PPV

, pos

itive

pre

dict

ive

valu

e; U

L, u

pper

lim

it; —

, ind

icat

es th

at th

e da

ta w

ere

not a

vaila

ble

for

that

pap

er.

a Al

l inf

ants

in th

is s

tudy

die

d; a

ll de

aths

occ

urre

d af

ter

the

asse

ssm

ents

.b

This

stu

dy u

sed

an “

Exte

nded

New

Bal

lard

” sc

orin

g sy

stem

to e

stim

ate

GA (

sim

ply

the

stan

dard

New

Bal

lard

sco

re e

xten

ded

to b

e us

ed to

est

imat

e a

grea

ter

GA r

ange

, whi

ch w

as c

alcu

late

d m

athe

mat

ical

ly).

c For

infa

nts

<39

wk

GA. M

ean

diffe

renc

e =

−0.

58 w

k fo

r in

fant

s >3

9 w

k GA

.d

This

stu

dy u

sed

a “r

evis

ed”

vers

ion

of th

e ph

ysic

al a

nd n

euro

logi

c po

rtio

ns o

f the

Bal

lard

ass

essm

ent.

TABL

e 4

Cont

inue

d

by guest on May 26, 2019www.aappublications.org/newsDownloaded from

Page 15: Diagnostic Accuracy of Neonatal Assessment for Gestational Age ... · with ultrasound, the Dubowitz score dated 95% of pregnancies within ±2.6 weeks (n = 7 studies), while the Ballard

tendency of the Dubowitz score to overestimate GA (0.48 weeks), although the precision of the GA estimates was similar to HIC studies (Supplemental Table 11).

In 2 studies, researchers showed evidence that Dubowitz scoring tended to overestimate GA in early preterm infants (Supplemental Table 12).42, 54

Ballard and New Ballard Score

We identified 30 studies in which researchers assessed the validity of the Original Ballard score (n = 20), the New Ballard score (n = 9), or both (n = 1) (Table 4) (17 ultrasound/BOE, 20 LMP reference), with 14 from LMIC. The Original and New Ballard scores assess the same clinical signs, with the New Ballard score20 having additional scoring categories for early preterm infants. Studies in which researchers used the Ballard score (Original or New) were combined for this analysis. Ballard assessments were performed by medically trained health workers (physicians, nurses, or research assistants) in the majority of studies and by community health workers in 2 studies.

Ultrasound or BOE Reference Standard

The correlation coefficients comparing Ballard score GA versus ultrasound or BOE ranged from 0.12 to 0.97 (median = 0.85, n = 7 studies). The mean GA difference ranged from −0.41 weeks (underestimation) to +1.4 weeks (overestimation) in 9 studies. The pooled mean difference was 0.40 weeks (95% CI: 0.00 to 0.81) (Table 5, Supplemental Fig 8), indicating a trend towards overestimation of GA. The pooled SD across the studies was 1.9 weeks, indicating that 95% of the differences in GA by Ballard assessment versus ultrasound dates fell within ±3.8 weeks (n = 9 studies, Table 5) of the mean. For the studies in which researchers reported on agreement in weeks, Ballard score dates fell

PEDIATRICS Volume 140, number 6, December 2017 15

Lee et alDiagnostic Accuracy of Neonatal Assessment for Gestational Age Determination: A Systematic Review

2017https://doi.org/10.1542/peds.2017-1423

4Pediatrics

ROUGH GALLEY PROOFOctober 2017

140

TABL

e 5

Pool

ed D

ata

for

Agre

emen

t and

Val

idity

of N

eona

tal C

linic

al A

sses

smen

ts

Asse

ssm

ent

Type

No. o

f St

udie

s Id

entifi

ed

Refe

renc

e St

anda

rdAg

reem

ent

Valid

ity

Mea

n Di

ffere

nce

Perc

ent W

ithin

1 w

kPe

rcen

t With

in

2 w

kSe

nsiti

vity

Spec

ifici

ty

N Po

oled

Diff

eren

ce (

95%

CIs

)Po

oled

SD

N Po

oled

% (

95%

CI

s)N

Pool

ed %

(9

5% C

Is)

N Po

oled

Sen

sitiv

ity

(%)

(95%

CIs

)Po

oled

Spe

cific

ity

(%)

(95%

CIs

)

Dubo

witz

9Ul

tras

ound

or

BOE

70.

02 (

−0.

51–

0.55

)1.

273

53.4

(46

.6–

71.3

)3

74.8

(44

.7–

91.6

)1

6199

20LM

P6

0.65

(0.

01–1

.30)

1.45

458

.5 (

40.9

–74

.2)

687

.0 (

71.2

–94

.8)

181

.598

.6

Balla

rd14

Ultr

asou

nd o

r BO

E9

0.40

(0.

00–0

.81)

1.90

334

.0 (

21.8

–44

.6)

572

.2 (

53.8

–85

.3)

464

.1 (

60.8

to 6

7.4)

95.1

(94

.5 to

95.

7)

18LM

P5

1.25

(0.

64–1

.87)

2.10

344

.6 (

24.9

–66

.2)

975

.8 (

70.6

–80

.5)

284

.1 (

81.6

to 8

6.3)

83.5

(79

.5 to

87.

0)

Park

in3

Ultr

asou

nd o

r BO

E3

−0.

17 (

−0.

26–

−0.

08)

1.97

0—

0—

——

—Er

egie

2LM

P1

——

0—

293

.4 (

91.3

–95

.1)

——

Capu

rro

4Ul

tras

ound

or

BOE

20.

11 (

−0.

02–

0.23

)1.

962

40.1

(34

.7–

45.8

)3

79.2

(65

.3–

88.6

)3

42.7

(35

.6 to

50.

0)96

.7 (

95.7

to 9

7.5)

—, n

ot a

pplic

able

.

by guest on May 26, 2019www.aappublications.org/newsDownloaded from

Page 16: Diagnostic Accuracy of Neonatal Assessment for Gestational Age ... · with ultrasound, the Dubowitz score dated 95% of pregnancies within ±2.6 weeks (n = 7 studies), while the Ballard

within 1 week of ultrasound dates in 34% (n = 3; 95% CI: 22% to 44%) of infants and within 2 weeks in 72% (n = 5, 95% CI: 54% to 85%) of newborns. The Ballard score had a pooled sensitivity (n = 4) of 64% (95% CI: 61% to 67%) and specificity of 95% (95% CI: 95% to 96%) for identifying preterm newborns. Among LMIC studies, the trend of GA overestimation was similar to HIC studies. However, the imprecision of GA estimation was greater in LMIC compared with HIC studies (pooled SD of 2.12 vs 1.49 weeks) (Supplemental Table 11).

In several studies, researchers reported evidence of greater bias in Ballard scoring among smaller babies (Supplemental Table 12). In 3 studies, researchers reported that the Original Ballard systematically overestimated GA by up to 2 to 3 weeks, in particular among preterm infants, 46, 47, 61 and generally, the trend was toward increasing bias in lower GAs. However, in a study in Papua New Guinea, Karl et al66 found the opposite trend. Wariyar et al47 reported that the New Ballard overestimated GA to a lesser degree than the Original Ballard in infants <30 weeks (1.6 vs 3.4 weeks, respectively). Among SGA infants, researchers in 2 studies showed that GA was underestimated by the original Ballard.40, 61

LMP Reference Standard

The correlation coefficients of Ballard and LMP GA ranged from 0.66 to 0.96 (median = 0.85; n = 13). The mean difference in GA was reported in 6 studies, ranging from 0.34 to 2.6 weeks (overestimation). The pooled mean difference was 0.70 weeks (95% CI: 0.36 to 1.04), indicating systematic overestimation (Table 5, Supplemental Fig 8). Ninety five percent of mean differences fell within ±4.2 weeks (n = 5 studies) of the mean. Ballard GA fell within 1 week of LMP GA in 45% (n = 3, 95% CI: 25% to 66%) of newborns and

within 2 weeks of LMP in 76% (n = 9, 95% CI: 71% to 81%) of newborns. The Ballard score had a pooled sensitivity (n = 2) of 84.1% (95% CI: 81.6% to 86.3%) and specificity of 83.5% (95% CI: 79.5% to 87.0%) for identifying preterm newborns (Fig 2). There were an inadequate number of studies to stratify analysis by LMIC versus HICs.

In 2 studies, researchers demonstrated overestimation of GA among preterm infants by the Original Ballard exam, 73, 79 but researchers in 1 study used the External Ballard only (Supplemental Table 12).79 In addition, researchers in 2 studies found that the Original Ballard performed differently among SGA infants: Baumann et al72 reported that the correlation of Ballard with GA was lower among SGA infants compared with those appropriate for gestational age. Constantine et al73 showed that for SGA babies, the bias for GA dating was 1 to 1.5 weeks lower than for non-SGA infants.

Other Clinical Assessments

Eighteen studies were identified in which researchers reported on the validity of other clinical methods of GA assessment (ie, Eregie et al, 40, 42, 80 Capurro et al, 15, 40, 81 – 84 Parkin et al, 16, 40, 47, 52, 54, 68 Bhagwat et al, 18, 33, 40 Tunçer et al, 26, 57 Finnström, 24 Narayanan et al, 30 and Robinson32, 47). These findings are reported in Supplemental Information 3 and Supplemental Table 13. In general, the majority of these exams were simplified assessments with fewer signs and were found to be less accurate than the Dubowitz or Ballard scores for GA dating (Supplemental Information 3; Supplemental Table 13; Table 5).

Interrater Agreement

In 10 studies, researchers reported upon the interrater agreement of GA estimates (Supplemental Table 14).

The κ for the classification of preterm births ranged from 0.73 to 0.93 (good to excellent; n = 3).20, 67, 85 The GA estimates were also highly correlated (r = 0.71–0.95)20, 86 and without significant differences between raters.49, 62, 64, 78

Anterior Vascularity of Lens

The literature searches for examination of the anterior vascular capsule of the lens (AVCL) yielded a total of 344 unique manuscripts (Fig 3), of which 10 met inclusion criteria (Table 6). Three were from LMIC (2 from South Asia, 1 from Africa). The studies were generally of smaller sample size (N = 30–356), and the latest was published in 1993. In general, study quality was poor, with a high risk of bias related to patient selection and reference standard. The overall QUADAS–2 assessment is in Supplemental Fig 9.

Assessments were typically performed at <72 hours of life by physicians in tertiary health facilities, with most studies performed in NICU settings and including only preterm and/or LBW infants. An ultrasound/BOE-based date was available in only 2 studies. Pupil dilation was performed before the assessment in 3 studies.

Correlation of AVCL Grading With GA

Hittner et al87, 89 reported that as the infant matures in gestation, the AVCL disappears in stages. In Grade 4 (27–28 weeks), the entire anterior surface of the lens is vascularized, reducing to no vasculature in Grade 0 (>34 weeks). Of note, the reference standard in the original Hittner study87 was the Dubowitz score.

In 2 studies, researchers presented data on the average GA determined by Hittner’s AVCL grading system (Table 6).46, 91 The correlation of AVCL grade with GA ranged from −0.84 to −0.96 (median: −0.88, n = 7) for preterm and/or LBW populations For the 2 studies in

LEE et al16

Lee et alDiagnostic Accuracy of Neonatal Assessment for Gestational Age Determination: A Systematic Review

2017https://doi.org/10.1542/peds.2017-1423

4Pediatrics

ROUGH GALLEY PROOFOctober 2017

140

by guest on May 26, 2019www.aappublications.org/newsDownloaded from

Page 17: Diagnostic Accuracy of Neonatal Assessment for Gestational Age ... · with ultrasound, the Dubowitz score dated 95% of pregnancies within ±2.6 weeks (n = 7 studies), while the Ballard

which researchers analyzed all GA populations, correlation was lower (−0.64 to −0.45).24, 30 Among SGA preterm newborns, the median correlation coefficient was −0.77 (range: −0.68 to −0.91, n = 3).72, 87, 89

other signs

The results of searches for intermammillary distance, skin impedance, and palmar creases are in Supplemental Information 4.

dIsCussIoN

Accurate GA determination is a public health priority to target and reduce preterm birth–related morbidity and mortality in LMIC. The Every Newborn Action Plan has prioritized GA measurement as a high-priority area to improve the epidemiology of preterm birth and SGA.34 In our systematic literature review, we identified 18 different newborn assessments that have been used for GA dating. The most commonly reported and validated scores in the literature were the Dubowitz and Ballard scores. The Dubowitz score dated 95% of newborns within ±2.6 weeks of ultrasound dating. The Ballard score tended to overestimate GA by 0.4 weeks compared with

ultrasound and dated 95% of infants within ±3.8 weeks of this mean. Newborn clinical assessments tend to overestimate GA among preterm infants and therefore may misclassify preterm infants as term. They also tended to underestimate GA in growth-restricted babies. Simplified assessments were less accurate. Although researchers in several studies showed promise of the anterior vascularity of the lens to classify GA <34 weeks, few compared AVCL with an ultrasound-based reference standard.

Study quality was a major limitation of the studies identified in the review, with half of studies having high risk of bias. Many of the original validation studies were from the 1970s, when LMP was the gold standard for pregnancy dating and ultrasound was not widely available. Many hospital-based studies were performed in NICUs among LBW babies and thus were prone to selection and measurement biases (eg, lack of blinding). Fewer than half of the studies were in LMIC, and studies in HICs may not be generalizable to LMIC settings because of health worker availability and training, and differences in the prevalence of SGA and preterm birth.

The majority of individual physical and neurologic signs that have been used in different scoring systems had fair to moderate correlation with GA. Skin opacity was the most weakly correlated and is perhaps the most affected by the timing of the assessment after birth. Although neurologic signs may be more affected by neonatal morbidity (birth asphyxia, neonatal infection, maternal medications, etc), the correlation coefficients of most signs were in a similar range to the physical criteria. In 2 studies21, 40 in which researchers excluded early to moderate preterm infants, the correlation of clinical signs with GA was lower, suggesting that the criteria may be more discriminating at lower GAs.

A critical consideration in LMIC is the validity of neonatal assessments in populations with high rates of SGA. Distinguishing whether a small baby is preterm, SGA, or both is a challenge in these settings. Most neonatal assessments were designed to measure infant maturity as opposed to gestational length. SGA infants may act less mature during a neonatal clinical assessment. Three studies have revealed that among SGA infants, neonatal clinical exams

PEDIATRICS Volume 140, number 6, December 2017 17

Lee et alDiagnostic Accuracy of Neonatal Assessment for Gestational Age Determination: A Systematic Review

2017https://doi.org/10.1542/peds.2017-1423

4Pediatrics

ROUGH GALLEY PROOFOctober 2017

140

FIGuRe 2Forest plots of the Ballard score sensitivity and specificity for identifying preterm births compared with ultrasound (A, B) and LMP (C, D).

by guest on May 26, 2019www.aappublications.org/newsDownloaded from

Page 18: Diagnostic Accuracy of Neonatal Assessment for Gestational Age ... · with ultrasound, the Dubowitz score dated 95% of pregnancies within ±2.6 weeks (n = 7 studies), while the Ballard

tend to systematically underestimate GA.40, 61, 73 Improving the validity of the neonatal assessment in growth-restricted populations is a critical research need in LMIC.30, 87, 92

The disappearance of the AVCL, or pupillary membrane, was found to correlate well with GA, although overall study quality was poor, with few studies with ultrasound-based references. AVCL may show promise in LMIC with high rates of fetal growth restriction because the grading correlated relatively well with GA, even among

growth-restricted or SGA infants.87 An important consideration is that the AVCL completely disappears after ∼34 weeks’ GA; thus, it may not help with GA dating >34 weeks. Furthermore, the AVCL exam requires specialized skills with an opthalmoscope, which may limit the feasibility and scalability in LMIC.

Several factors should be considered in interpreting and generalizing the validity of neonatal GA assessments in different settings. Imprecision of the Ballard score was greater in LMIC studies compared with HIC studies

(HICs: ±3.0 weeks; LMIC: ±4.2 weeks). The validity of a clinical assessment may vary with the level of medical training of the assessor.40, 70 Most of the LMIC studies used physicians, nurses, or midwives, and there were few studies with frontline health workers. The validity of the newborn assessment has primarily been studied in the facility and/or hospital-based setting, and the few studies in home-based settings had poorer performance.40, 70 Certain factors may improve the validity in the hospital setting, including the timing of assessment sooner after birth, being in a more controlled environment, and lighting. The development of some characteristics may vary by ethnicity. For example, plantar creases progress differently in African American populations93 and skin color may vary. Morbidities, such as gestational diabetes, are more common in specific populations94 and may affect the maturity assessment. Finally, the performance may also be affected by the GA ranges in which it is tested. The performance and validity of the assessments may vary in a general population with a larger representation of late preterm and near-term infants compared with a NICU.

Feasibility and scalability are critical factors to consider in LMIC. As shown in this review, there is a positive correlation between the number of parameters and accuracy of a GA assessment. Yet there is likely to be a negative correlation between the number of parameters (especially neurologic) and the feasibility of use. While the Dubowitz score had the best accuracy, the assessment is complex, may take 15 to 20 minutes to complete, and includes more difficult-to-train neurologic criteria. In South Asia and sub-Saharan Africa, approximately half of births occur outside of hospital facilities, and community-based health workers or traditional birth attendants may be the first point of contact for

LEE et al18

Lee et alDiagnostic Accuracy of Neonatal Assessment for Gestational Age Determination: A Systematic Review

2017https://doi.org/10.1542/peds.2017-1423

4Pediatrics

ROUGH GALLEY PROOFOctober 2017

140

FIGuRe 3AVCL: flow diagram. Diagram of the screening process to identify studies for inclusion in AVCL review; adapted from the PRISMA (Moher D, Liberati A, Tetzlaff J, Altman DG; PRISMA Group. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Med. 2009 Jul 21;6(7):e1000097).

by guest on May 26, 2019www.aappublications.org/newsDownloaded from

Page 19: Diagnostic Accuracy of Neonatal Assessment for Gestational Age ... · with ultrasound, the Dubowitz score dated 95% of pregnancies within ±2.6 weeks (n = 7 studies), while the Ballard

PEDIATRICS Volume 140, number 6, December 2017 19

Lee et alDiagnostic Accuracy of Neonatal Assessment for Gestational Age Determination: A Systematic Review

2017https://doi.org/10.1542/peds.2017-1423

4Pediatrics

ROUGH GALLEY PROOFOctober 2017

140

TABL

e 6

Corr

elat

ion

of A

VCL

With

GA

Auth

orYe

arSt

udy

Sett

ing

and

Loca

tion

Popu

latio

nSa

mpl

e Si

ze (

N)Re

fere

nce

Stan

dard

Tim

e of

As

sess

men

t Aft

er

Birt

h

Corr

elat

ion

Coef

ficie

nt (

R) it

h Re

fere

nce

GA

GA

A) R

ange

or

B) M

ean

(SD)

(n)

Grad

e 0a

Grad

e 1

Grad

e 2

Grad

e 3

Grad

e 4

Finn

strö

m24

1972

Univ

ersi

ty H

ospi

tal;

Umea

, Sw

eden

All G

As17

4LM

PFr

om b

irth

up

to

60 h

0.45

b—

——

——

Hitt

ner

et a

l8719

77Te

rtia

ry H

ospi

tal;

Hous

ton,

TX

Pret

erm

(27

–34

wk)

100

LMP

and

Dubo

witz

With

in 3

0 h

−0.

88—

——

——

Subp

opul

atio

n:

pret

erm

SGA

12LM

P an

d Du

bow

itzW

ithin

30

h−

0.91

——

——

Guill

ory

et a

l8819

80Te

rtia

ry H

ospi

tal;

Hous

ton,

TX

Pret

erm

43LM

P an

d Du

bow

itz“S

oon

afte

r bi

rth”

−0.

88—

——

——

24 h

aft

er b

irth

−0.

86—

——

——

Hitt

ner

et a

l8919

81Te

rtia

ry H

ospi

tal;

Hous

ton,

TX

“Pre

term

SGA

”33

Dubo

witz

With

in 2

4 h

−0.

77—

A) >

33 w

k (n

=

24c )

A) 3

1–34

w

k (n

=

7)

A) 3

3 w

k (n

=

1)A)

28

wk;

(n

= 1)

Kris

hnam

ohan

et

al90

1982

NICU

s; F

arm

ingt

on

and

Hart

ford

, CT

Pret

erm

(28

–32

wk)

30Ba

llard

, with

in

2 w

k of

LM

PW

ithin

24

h−

0.94

——

——

Nara

yana

n et

al30

1982

Child

ren’

s Ho

spita

l; Ne

w D

elhi

, Ind

iaAl

l new

born

s; a

ll GA

s35

6LM

P, o

r OB

es

timat

e if

avai

labl

e

With

in 4

8 h

−0.

64—

——

——

Subp

opul

atio

n:

<35

wk

GA18

4Sa

me

as a

bove

With

in 4

8 h

−0.

96—

——

——

Sasi

vim

olku

l et

al91

1986

Univ

ersi

ty H

ospi

tal;

Bang

kok,

Th

aila

nd

LBW

80Ba

llard

and

LM

P24

–48

h−

0.83

9B)

36.

3 (1

.86)

(n

= 43

)

B) 3

4.0

(2.1

) (n

=

13)

B) 3

2.4

(1.4

) (n

=

12)

B) 2

9.9

(0.4

); (n

=

7)

B) 2

7.8

(0.8

) (n

= 5

)

Subp

opul

atio

n:

LBW

≥34

wk

40Ba

llard

and

LM

P24

–48

h−

0.88

——

——

Skap

inke

r an

d Ro

thbe

rg92

1987

Univ

ersi

ty H

ospi

tal;

Joha

nnes

burg

, So

uth

Afri

ca

Pret

erm

(<3

5 w

k)58

Balla

rdW

ithin

36

h−

0.84

——

——

Sand

ers

et a

l4619

91NI

CU; B

altim

ore,

MD

Pret

erm

and

bir

th

wei

ght <

1500

g89

BOE (u

ltras

ound

w

as

avai

labl

e fo

r 92

% o

f w

omen

)

With

in 7

2 h

—B)

32.

4B)

30.

4B)

29.

8B)

28.

7B)

26.

7

Baum

ann

et a

l72

(tra

nsla

ted)

1993

Univ

ersi

ty

Hosp

ital;

Bern

, Sw

itzer

land

<34

wk

AGA

60Ul

tras

ound

NS−

0.92

± 0

.04

(CI:

0.81

– 0.

97)

——

——

<34

wk

SGA

29Ul

tras

ound

NS−

0.68

± 0

.09

(CI:

0.49

–0.8

2)—

——

——

OB, o

bste

tric

; NS,

not

sta

ted;

—, i

ndic

ates

that

the

data

wer

e no

t ava

ilabl

e fo

r th

at p

aper

.a

The

AVCL

gra

ding

sys

tem

is a

s de

scri

bed

in H

ittne

r et

al.87

b Fin

nstr

öm24

use

d th

e Ha

rnac

k an

d Os

ter

(195

8) g

radi

ng s

yste

m, a

cla

ssifi

catio

n sy

stem

with

gra

des

1–3,

in w

hich

1 e

qual

s m

ost v

ascu

lari

ty a

nd 3

equ

als

no v

ascu

lari

ty. T

here

fore

, the

cor

rela

tion

betw

een

disa

ppea

ranc

e of

the

AVCL

and

incr

easi

ng

GA is

not

ed a

s po

sitiv

e un

der

this

cla

ssifi

catio

n sy

stem

but

wou

ld b

e ne

gativ

e by

the

Hitt

ner

grad

ing

syst

em.

c N =

24

for

both

Gra

des

1 an

d 0

com

bine

d; th

e GA

ran

ge s

tate

d (≥

33

wk)

com

pris

es in

fant

s th

at s

core

d ei

ther

a 1

or

0.

by guest on May 26, 2019www.aappublications.org/newsDownloaded from

Page 20: Diagnostic Accuracy of Neonatal Assessment for Gestational Age ... · with ultrasound, the Dubowitz score dated 95% of pregnancies within ±2.6 weeks (n = 7 studies), while the Ballard

newborns. These health workers may not have the medical training or the time required to perform the assessment. The duration of the assessment as well as the feasibility of training, standardization, and quality control are critical considerations for scalability in LMIC.

Finally, when evaluating methods of GA assessment, the clinical, research, and programmatic objectives should be weighed. For the clinician, the primary objective is to identify preterm infants requiring special care, and individual-level misclassification may result in missed intervention opportunities. A measurement tool with high sensitivity is desired to identify all preterm infants, perhaps at the expense of specificity. A very simple tool based on a single parameter (such as foot size or another anthropometric parameter) may be suitable to meet these needs. On the other hand, for research, a more precise and continuous measurement of GA is desirable and early pregnancy ultrasound should be used. At the population level, inaccuracy and imprecision in GA dating may result in biased estimates of preterm birth rates and epidemiologic associations with preterm birth.95 Determining the optimal precision (ie, a 95% CI of ±1 or 2 vs 3 weeks) and diagnostic accuracy is also critical to choosing an appropriate method of GA measurement for LMIC. Future research priorities for improving GA determination in LMIC are shown in Fig 4.

CoNCLusIoNs

As part of the Metrics Group of the Every Newborn Action Plan, we have conducted the first systematic review and meta-analysis assessing the diagnostic accuracy of neonatal GA assessments and scores. The most commonly used

assessment, the Ballard score, tended to overestimate GA and had wide margins of error. The Dubowitz score had improved accuracy, although feasibility is a critical consideration in LMIC, and the complexity, training, and time to conduct the assessment are challenges to scale up. Additional high-quality studies are needed in LMIC to determine the accuracy of neonatal assessment compared with an early ultrasound reference, particularly in settings with SGA, as well as to explore the feasibility of implementation of complex GA assessments. This work also underlines the importance of future focus on increasing the maternal demand for knowledge of the GA of their pregnancy, improving coverage of early pregnancy ultrasound scans, and innovations to improve GA assessment in late pregnancy, such as novel ultrasound approaches. In settings where early ultrasound is not possible, increased efforts and innovation are urgently needed to develop simpler yet specific approaches for clinical GA assessment of the newborn, either through new combinations of existing parameters, new signs, or technology.

ACkNowLedGMeNTs

We acknowledge the students who were also part of the GA working group in the Brigham and Women’s Hospital global newborn health laboratory (Chelsea Clark). We also thank the Brigham and Women’s Hospital Department of Newborn Medicine and Dr Terrie Inder for their support of this work. Finally, we thank the following individuals for their assistance in translating foreign articles: Madeline Gilbert, Alison Leschen, Maria Dąbrowska, Susan Throckmorton, Felix Bergmann, and Lina Driouk.

ABBReVIATIoNs

AVCL:  anterior vascular capsule of the lens

BOE:  best obstetric estimateCI:  confidence intervalGA:  gestational ageHIC:  high-income countryLBW:  low birth weightLMIC:  low- and middle-income

countriesLMP:  last menstrual periodQUADAS–2:  Quality Assessment

of Diagnostic Accuracy Studies–2

SGA:  small for gestational age

LEE et al20

Lee et alDiagnostic Accuracy of Neonatal Assessment for Gestational Age Determination: A Systematic Review

2017https://doi.org/10.1542/peds.2017-1423

4Pediatrics

ROUGH GALLEY PROOFOctober 2017

140

FIGuRe 4Research priorities to improve GA dating in LMIC.

by guest on May 26, 2019www.aappublications.org/newsDownloaded from

Page 21: Diagnostic Accuracy of Neonatal Assessment for Gestational Age ... · with ultrasound, the Dubowitz score dated 95% of pregnancies within ±2.6 weeks (n = 7 studies), while the Ballard

ReFeReNCes

1. Blencowe H, Cousens S, Oestergaard MZ, et al. National, regional, and worldwide estimates of preterm birth rates in the year 2010 with time trends since 1990 for selected countries: a systematic analysis and implications. Lancet. 2012;379(9832):2162–2172

2. World Bank . World bank country and lending groups. 2017 . Available at: https:// datahelpdesk. worldbank. org/ knowledgebase/ articles/ 906519- world- bank- country- and- lending- groups . Accessed May 21, 2017

3. Liu L, Johnson HL, Cousens S, et al; Child Health Epidemiology Reference Group of WHO and UNICEF. Global, regional, and national causes of child mortality: an updated systematic analysis for 2010 with time trends since 2000. Lancet. 2012;379(9832):2151–2161

4. Aliyu LD, Kurjak A, Wataganara T, et al. Ultrasound in Africa: what can really be done? J Perinat Med. 2016;44(2):119–123

5. Savitz DA, Terry JW Jr, Dole N, Thorp JM Jr, Siega-Riz AM, Herring AH. Comparison of pregnancy dating by last menstrual period, ultrasound scanning, and their combination. Am J Obstet Gynecol. 2002;187(6):1660–1666

6. World Health Organization . Global health workforce shortage to reach 12.9 million in coming decades. 2013 . Available at: www. who. int/ mediacentre/ news/ releases/ 2013/ health- workforce- shortage/ en/ . Accessed May 28, 2017

7. UNICEF . Antenatal care: current status and progress. 2017 . Available at: https:// data. unicef. org/ topic/ maternal- health/ antenatal- care/ . Accessed April 17, 2017

8. Bucher S, Marete I, Tenge C, et al. A prospective observational description of frequency and timing of antenatal care attendance and coverage of selected interventions from sites in Argentina, Guatemala, India, Kenya, Pakistan and Zambia. Reprod Health. 2015;12(suppl 2):S12

9. Wang W, Alva S, Wang S, Fort A. Levels and Trends in the Use of Maternal Health Services in Developing Countries. Calverton, MD: USAID; 2011

10. Committee opinion no 611: method for estimating due date. Obstet Gynecol. 2014;124(4):863–866

11. Blencowe H, Cousens S, Chou D, et al; Born Too Soon Preterm Birth Action Group. Born too soon: the global epidemiology of 15 million preterm births. Reprod Health. 2013; 10(suppl 1):S2

12. Farr V, Mitchell RG, Neligan GA, Parkin JM. The definition of some external characteristics used in the assessment of gestational age in the newborn infant. Dev Med Child Neurol. 1966;8(5):507–511

13. Amiel-Tison C. Neurological evaluation of the maturity of newborn infants. Arch Dis Child. 1968;43(227):89–93

14. Dubowitz LM, Dubowitz V, Goldberg C. Clinical assessment of gestational age in the newborn infant. J Pediatr. 1970;77(1):1–10

15. Capurro H, Konichezky S, Fonseca D, Caldeyro-Barcia R. A simplified method for diagnosis of gestational age in the newborn infant. J Pediatr. 1978;93(1):120–122

16. Parkin JM, Hey EN, Clowes JS. Rapid assessment of gestational age at birth. Arch Dis Child. 1976;51(4):259–263

17. Eregie CO. Assessment of gestational age: modification of a simplified method. Dev Med Child Neurol. 1991;33(7):596–600

18. Bhagwat VA, Dahat HB, Bapat NG. Determination of gestational age of newborns–a comparative study. Indian Pediatr. 1990;27(3):272–275

19. Ballard JL, Novak KK, Driver M. A simplified score for assessment of

PEDIATRICS Volume 140, number 6, December 2017 21

Lee et alDiagnostic Accuracy of Neonatal Assessment for Gestational Age Determination: A Systematic Review

2017https://doi.org/10.1542/peds.2017-1423

4Pediatrics

ROUGH GALLEY PROOFOctober 2017

140

Partners International, Yangon, Myanmar; and hFaculty of Epidemiology and Population Health and iThe Centre for Maternal, Adolescent, Reproductive, and Child Health (MARCH), London School of Hygiene and Tropical Medicine, London, United Kingdom

Dr Lee conceptualized and designed the study, coordinated and supervised data collection, completed secondary data extraction, and drafted, reviewed, revised, and finalized the manuscript; Dr Panchal designed the database searches, carried out initial screening and data extraction for postnatal clinical exams, conducted meta-analyses, and reviewed and revised the manuscript; Ms Folger screened and extracted data for anterior vascularity of the lens, helped write sections of the manuscript, and formatted, reviewed, and revised the manuscript; Dr Whelan undertook initial screening and data extraction for postnatal clinical exams and reviewed the manuscript; Ms Whelan coordinated and supervised data collection and data extraction, reviewed the extracted data, and reviewed and revised the manuscript; Dr Rosner advised the statistical analysis of the data extracted, provided feedback on analyses, and reviewed and revised the manuscript; Drs Blencowe and Lawn helped synthesize the data and data analysis and critically reviewed and revised the manuscript; and all authors approved the final manuscript as submitted.

This systematic review was registered with the International Prospective Register of Systematic Reviews. PROSPERO registration number: CRD42015020499.

doI: https:// doi. org/ 10. 1542/ peds. 2017- 1423

Accepted for publication Jul 25, 2017

Address correspondence to Anne CC Lee, MD, MPH, Department of Pediatric Newborn Medicine, Brigham and Women’s Hospital, BB502A, 75 Francis St, Boston, MA 02115. E-mail: [email protected]

PEDIATRICS (ISSN Numbers: Print, 0031-4005; Online, 1098-4275).

Copyright © 2017 by the American Academy of Pediatrics

FINANCIAL dIsCLosuRe: The authors have indicated they have no financial relationships relevant to this article to disclose.

FuNdING: This work was supported by the Bill & Melinda Gates Foundation through grant OPP1130198.

PoTeNTIAL CoNFLICT oF INTeResT: The authors have indicated they have no potential conflicts of interest to disclose.

by guest on May 26, 2019www.aappublications.org/newsDownloaded from

Page 22: Diagnostic Accuracy of Neonatal Assessment for Gestational Age ... · with ultrasound, the Dubowitz score dated 95% of pregnancies within ±2.6 weeks (n = 7 studies), while the Ballard

fetal maturation of newly born infants. J Pediatr. 1979;95(5, pt 1):769–774

20. Ballard JL, Khoury JC, Wedig K, Wang L, Eilers-Walsman BL, Lipp R. New Ballard score, expanded to include extremely premature infants. J Pediatr. 1991;119(3):417–423

21. Amiel-Tison C; Maillard, F; Lebrun, F; Breart, G; Papiernik E. Neurological and physical maturation in normal growth singletons from 37 to 41 weeks’ gestation. Early Human Development. 1999;54:145–156

22. Feresu SA, Gillespie BW, Sowers MF, Johnson TR, Welch K, Harlow SD. Improving the assessment of gestational age in a Zimbabwean population. Int J Gynaecol Obstet. 2002;78(1):7–18

23. Nicolopoulos D, Perakis A, Papadakis M, Alexiou D, Aravantinos D. Estimation of gestational age in the neonate: a comparison of clinical methods. Am J Dis Child. 1976;130(5):477–480

24. Finnström O. Studies on maturity in newborn infants. II. External characteristics. Acta Paediatr Scand. 1972;61(1):24–32

25. Farr V. Estimation of gestational age by neurological assessment in first week of life. Arch Dis Child. 1968;43(229):353–357

26. Tunçer M, Yilgör E, Erdem G. A new, simple three-step method for determining gestational age. Turk J Pediatr. 1982;23(2):85–97

27. Kollée LA, Leusink J, Peer PG. Assessment of gestational age: a simplified scoring system. J Perinat Med. 1985;13(3):135–138

28. Klimek R, Klimek M, Rzepecka-Weglarz B. A new score for postnatal clinical assessment of fetal maturity in newborn infants. Int J Gynaecol Obstet. 2000;71(2):101–105

29. Allan RC, Sayers S, Powers J, Singh G. The development and evaluation of a simple method of gestational age estimation. J Paediatr Child Health. 2009;45(1–2):15–19

30. Narayanan I, Dua K, Gujral VV, Mehta DK, Mathew M, Prabhakar AK. A simple method of assessment of gestational age in newborn infants. Pediatrics. 1982;69(1):27–32

31. Robinson RJ. Assessment of gestational age by neurological

examination. Arch Dis Child. 1966;41(218):437–447

32. Serfontein GL, Jaroszewicz AM. Estimation of gestational age at birth. Comparison of two methods. Arch Dis Child. 1978;53(6):509–511

33. Bindusha S, Rasalam CS, Sreedevi N. Gestational age assessment of newborn- clinical trial of a simplified method. Transworld Med J. 2014;2(1):24–28

34. World Health Organization (WHO); United Nations International Children’s Emergency Fund (UNICEF). Every Newborn: An Action Plan to End Preventable Deaths (ENAP). Geneva, Switzerland: World Health Organization; 2014

35. World Health Organization (WHO) . Every Newborn Action Plan Metrics. In: WHO technical consultation on newborn health indicators ; December 3–5, 2014 ; Ferney Voltaire, France

36. Whiting PF, Rutjes AW, Westwood ME, et al; QUADAS-2 Group. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. 2011;155(8):529–536

37. Kramer MS, McLean FH, Boyd ME, Usher RH. The validity of gestational age estimation by menstrual dating in term, preterm, and postterm gestations. JAMA. 1988;260(22):3306–3308

38. Rosner B. Fundamentals of Biostatistics. 8th ed. Boston, MA: Cengage Learning; 2016

39. Macaskill P, Gatsonis C, Deeks JJ, Harbord RM, Takwoingi Y. Analysing and Presenting Results. In: Deeks JJ, Bossuyt PM, Gatsonis C, eds. Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy Version 1.0, 10. The Cochrane Collaboration; 2010. Available at: http:// srdta. cochrane. org/

40. Lee AC, Mullany LC, Ladhani K, et al; Projahnmo Study Group. Validity of newborn clinical assessment to determine gestational age in Bangladesh. Pediatrics. 2016;138(1):e20153303

41. Raghu MB, Patel YS, Gupta K. Estimation of gestational age in Zambian newborn infants. Ann Trop Paediatr. 1981;1(4):245–247

42. Sunjoh F, Njamnshi AK, Tietche F, Kago I. Assessment of gestational age in the Cameroonian newborn infant: a comparison of four scoring methods. J Trop Pediatr. 2004;50(5):285–291

43. Roberts CJ, Hibbard BM, Evans DR, et al. Precision in estimating gestational age and its influence on sensitivity of alphafetoprotein screening. BMJ. 1979;1(6169):981–983

44. Vik T, Vatten L, Markestad T, Jacobsen G, Bakketeig LS. Dubowitz assessment of gestational age and agreement with prenatal methods. Am J Perinatol. 1997;14(6):369–373

45. Awoust J, Keuwez JJ, Levi S. Comparison between three methods for assessment of fetal age. J Foetal Med. 1982;2(1):11–15

46. Sanders M, Allen M, Alexander GR, et al. Gestational age assessment in preterm neonates weighing less than 1500 grams. Pediatrics. 1991;88(3):542–546

47. Wariyar U, Tin W, Hey E. Gestational assessment assessed. Arch Dis Child Fetal Neonatal Ed. 1997;77(3):F216–F220

48. Robillard PY, De Caunes F, Alexander GR, Sergent MP. Validity of postnatal assessments of gestational age in low birthweight infants from a Caribbean community. J Perinatol. 1992;12(2):115–119

49. Shukla H, Atakent YS, Ferrara A, Topsis J, Antoine C. Postnatal overestimation of gestational age in preterm infants. Am J Dis Child. 1987;141(10):1106–1107

50. Moore KA, Simpson JA, Thomas KH, et al. Estimating gestational age in late presenters to antenatal care in a resource-limited setting on the Thai-Myanmar border. PLoS One. 2015;10(6):e0131025

51. Rosenberg RE, Ahmed AS, Ahmed S, et al. Determining gestational age in a low-resource setting: validity of last menstrual period. J Health Popul Nutr. 2009;27(3):332–338

52. Karunasekera KA, Sirisena J, Jayasinghe JA, Perera GU. How accurate is the postnatal estimation of gestational age? J Trop Pediatr. 2002;48(5):270–272

53. Mitchell D. Accuracy of pre- and postnatal assessment of

LEE et al22

Lee et alDiagnostic Accuracy of Neonatal Assessment for Gestational Age Determination: A Systematic Review

2017https://doi.org/10.1542/peds.2017-1423

4Pediatrics

ROUGH GALLEY PROOFOctober 2017

140

by guest on May 26, 2019www.aappublications.org/newsDownloaded from

Page 23: Diagnostic Accuracy of Neonatal Assessment for Gestational Age ... · with ultrasound, the Dubowitz score dated 95% of pregnancies within ±2.6 weeks (n = 7 studies), while the Ballard

gestational age. Arch Dis Child. 1979;54(11):896–897

54. Vogt H, Haneberg B, Finne PH, Stensberg A. Clinical assessment of gestational age in the newborn infant. An evaluation of two methods. Acta Paediatr Scand. 1981;70(5):669–672

55. Latis GO, Simionato L, Ferraris G. Clinical assessment of gestational age in the newborn infant. Comparison of two methods. Early Hum Dev. 1981;5(1):29–37

56. Hertz RH, Sokol RJ, Knoke JD, Rosen MG, Chik L, Hirsch VJ. Clinical estimation of gestational age: rules for avoiding preterm delivery. Am J Obstet Gynecol. 1978;131(4):395–402

57. Cevit O, Bayram B, Toksoy HB, Gültekin A, Gökalp A. Gestational age assessment in preterm neonates weighing less than 2500 grams. J Trop Pediatr. 1998;44(1):57–58

58. Jaroszewicz AM, Boyd IH. Clinical assessment of gestational age in the newborn. S Afr Med J. 1973;47(44):2123–2124

59. Dawodu A, Qureshi MM, Moustafa IA, Bayoumi RA. Epidemiology of clinical hyperbilirubinaemia in Al Ain, United Arab Emirates. Ann Trop Paediatr. 1998;18(2):93–99

60. Scher MS, Barmada MA. Estimation of gestational age by electrographic, clinical, and anatomic criteria. Pediatr Neurol. 1987;3(5):256–262

61. Alexander GR, de Caunes F, Hulsey TC, Tompkins ME, Allen M. Validity of postnatal assessments of gestational age: a comparison of the method of Ballard et al. and early ultrasonography. Am J Obstet Gynecol. 1992;166(3):891–895

62. Smith LN, Dayal VH, Monga M. Prior knowledge of obstetric gestational age and possible bias of Ballard score. Obstet Gynecol. 1999;93(5, pt 1):712–714

63. Dombrowski MP, Wolfe HM, Brans YW, Saleh AA, Sokol RJ. Neonatal morphometry. Relation to obstetric, pediatric, and menstrual estimates of gestational age. Am J Dis Child. 1992;146(7):852–856

64. Gagliardi L, Scimone F, DelPrete A, et al. Precision of gestational age assessment in the neonate. Acta Paediatr. 1992;81(2):95–99

65. Amato M, Hüppi P, Claus R. Rapid biometric assessment of gestational age in very low birth weight infants. J Perinat Med. 1991;19(5):367–371

66. Karl S, Li Wai Suen CS, Unger HW, et al. Preterm or not–an evaluation of estimates of gestational age in a cohort of women from Rural Papua New Guinea. PLoS One. 2015;10(5):e0124286

67. Moraes CL, Reichenheim ME. [Validity of neonatal clinical assessment for estimation of gestational age: comparison of new ++Ballard+ score with date of last menstrual period and ultrasonography]. Cad Saude Publica. 2000;16(1):83–94

68. Sreekumar K, d’Lima A, Nesargi S, Rao S, Bhat S. Comparison of New Ballards score and Parkins score for gestational age estimation. Indian Pediatr. 2013;50(8):771–773

69. Wylie BJ, Kalilani-Phiri L, Madanitsa M, et al. Gestational age assessment in malaria pregnancy cohorts: a prospective ultrasound demonstration project in Malawi. Malar J. 2013;12:183

70. Taylor RA, Denison FC, Beyai S, Owens S. The external Ballard examination does not accurately assess the gestational age of infants born at home in a rural community of The Gambia. Ann Trop Paediatr. 2010;30(3):197–204

71. Thi HN, Khanh DK, Thu HT, Thomas EG, Lee KJ, Russell FM. Foot length, chest circumference, and mid upper arm circumference are good predictors of low birth weight and prematurity in ethnic minority newborns in Vietnam: a hospital-based observational study. PLoS One. 2015;10(11):e0142420

72. Baumann C, Hüppi P, Amato M. [Prenatal and postnatal determination of gestational age of small newborn infants]. Z Geburtshilfe Perinatol. 1993;197(3):135–140

73. Constantine NA, Kraemer HC, Kendall-Tackett KA, Bennett FC, Tyson JE, Gross RT. Use of physical and neurologic observations in assessment of gestational age in low birth weight infants. J Pediatr. 1987;110(6):921–928

74. Mackanjee HR, Iliescu BM, Dawson WB. Assessment of postnatal gestational age using sonographic measurements

of femur length. J Ultrasound Med. 1996;15(2):115–120

75. Alexander GR, Hulsey TC, Smeriglio VL, Comfort M, Levkoff A. Factors influencing the relationship between a newborn assessment of gestational maturity and the gestational age interval. Paediatr Perinat Epidemiol. 1990;4(2):133–146

76. Alexander GR, de Caunes F, Hulsey TC, Tompkins ME, Allen M. Ethnic variation in postnatal assessments of gestational age: a reappraisal. Paediatr Perinat Epidemiol. 1992;6(4):423–433

77. Ahn Y. Assessment of gestational age using an extended New Ballard examination in Korean newborns. J Trop Pediatr. 2008;54(4):278–281

78. Sasidharan K, Dutta S, Narang A. Validity of New Ballard score until 7th day of postnatal life in moderately preterm neonates. Arch Dis Child Fetal Neonatal Ed. 2009;94(1):F39–F44

79. Verhoeff FH, Milligan P, Brabin BJ, Mlanga S, Nakoma V. Gestational age assessment by nurses in a developing country using the Ballard method, external criteria only. Ann Trop Paediatr. 1997;17(4):333–342

80. Eregie CO. A new method for maturity determination in newborn infants. J Trop Pediatr. 2000;46(3):140–144

81. Oliveira S, Kimura AMR . Evaluation of the gestational age through prenatal and postnatal data. In: 4th World Congress of Perinatal Medicine ; Buenos Aires, Argentina ; 1999 . 1091 – 1094

82. Pereira AP, Dias MA, Bastos MH, da Gama SG, Leal MC. Determining gestational age for public health care users in Brazil: comparison of methods and algorithm creation. BMC Res Notes. 2013;6:60

83. Laveriano WRV. [Reliability of the post natal gestational assessment: Capurro test compared with ultrasound at 10+0 to 14+2 weeks of gestation]. Rev Peru Ginecol Obstet. 2015;61(2):115–118

84. Neufeld LM, Haas JD, Grajéda R, Martorell R. Last menstrual period provides the best estimate of gestation length for women in rural Guatemala. Paediatr Perinat Epidemiol. 2006;20(4):290–298

PEDIATRICS Volume 140, number 6, December 2017 23

Lee et alDiagnostic Accuracy of Neonatal Assessment for Gestational Age Determination: A Systematic Review

2017https://doi.org/10.1542/peds.2017-1423

4Pediatrics

ROUGH GALLEY PROOFOctober 2017

140

by guest on May 26, 2019www.aappublications.org/newsDownloaded from

Page 24: Diagnostic Accuracy of Neonatal Assessment for Gestational Age ... · with ultrasound, the Dubowitz score dated 95% of pregnancies within ±2.6 weeks (n = 7 studies), while the Ballard

85. Lee Anne CC, Uddin J, Shah R, et al . Validation of community health worker clinical assessment of gestational age in rural Bangladesh. In: Proceedings from the Pediatric Academic Societies Annual Meeting ; May 2013 ; Washington, DC

86. Aslan Y, Yildiran A, Sen Y, Erduran E, Kasim S, Gedik Y. Assessment of gestational age in healthy neonates by auxiliary health personnel using a simple scoring system. Turk Klin J Med Res. 2000;18(3):121–125

87. Hittner HM, Hirsch NJ, Rudolph AJ. Assessment of gestational age by examination of the anterior vascular capsule of the lens. J Pediatr. 1977;91(3):455–458

88. Guillory C, Carsia-Prats JA, Hittner HM, Rudolph J. Effect of prenatal

steroid administration on the anterior vascular capsule of the lens (AVCL) in preterm infants. Pediatr Res. 1980;14(4):456

89. Hittner HM, Gorman WA, Rudolph AJ. Examination of the anterior vascular capsule of the lens: II. Assessment of gestational age in infants small for gestational age. J Pediatr Ophthalmol Strabismus. 1981;18(2):52–54

90. Krishnamohan VK, Wheeler MB, Testa MA, Philipps AF. Correlation of postnatal regression of the anterior vascular capsule of the lens to gestational age. J Pediatr Ophthalmol Strabismus. 1982;19(1):28–32

91. Sasivimolkul W, Siripoonya P, Tejavej A. Gestational age assessment by the examination of the anterior vascular

capsule of the lens. J Med Assoc Thai. 1986;69(suppl 2):38–45

92. Skapinker R, Rothberg AD. Postnatal regression of the tunica vasculosa lentis. J Perinatol. 1987;7(4): 279–281

93. Damoulaki-Sfakianski E, Robertson A, Gordero L. Skin creases on the sole of the foot as a physical index of maturity: comparison between Caucasian and Negro infants. Pediatrics. 1972;50(3):483–485

94. Fujimoto W, Samoa R, Wotring A. Gestational diabetes in high-risk populations. Clin Diabetes. 2013;31(2):90–94

95. Martin JA, Hamilton BE, Osterman MJ, Curtin SC, Matthews TJ. Births: final data for 2013. Natl Vital Stat Rep. 2015;64(1):1–65

LEE et al24

Lee et alDiagnostic Accuracy of Neonatal Assessment for Gestational Age Determination: A Systematic Review

2017https://doi.org/10.1542/peds.2017-1423

4Pediatrics

ROUGH GALLEY PROOFOctober 2017

140

by guest on May 26, 2019www.aappublications.org/newsDownloaded from

Page 25: Diagnostic Accuracy of Neonatal Assessment for Gestational Age ... · with ultrasound, the Dubowitz score dated 95% of pregnancies within ±2.6 weeks (n = 7 studies), while the Ballard

originally published online November 17, 2017; Pediatrics Rosner, Hannah Blencowe and Joy E. Lawn

Anne CC Lee, Pratik Panchal, Lian Folger, Hilary Whelan, Rachel Whelan, BernardA Systematic Review

Diagnostic Accuracy of Neonatal Assessment for Gestational Age Determination:

ServicesUpdated Information &

017-1423http://pediatrics.aappublications.org/content/early/2017/11/15/peds.2including high resolution figures, can be found at:

References

017-1423#BIBLhttp://pediatrics.aappublications.org/content/early/2017/11/15/peds.2This article cites 85 articles, 15 of which you can access for free at:

Subspecialty Collections

alth_subhttp://www.aappublications.org/cgi/collection/international_child_heInternational Child Healthhttp://www.aappublications.org/cgi/collection/neonatology_subNeonatologysubhttp://www.aappublications.org/cgi/collection/fetus:newborn_infant_Fetus/Newborn Infantfollowing collection(s): This article, along with others on similar topics, appears in the

Permissions & Licensing

http://www.aappublications.org/site/misc/Permissions.xhtmlin its entirety can be found online at: Information about reproducing this article in parts (figures, tables) or

Reprintshttp://www.aappublications.org/site/misc/reprints.xhtmlInformation about ordering reprints can be found online:

by guest on May 26, 2019www.aappublications.org/newsDownloaded from

Page 26: Diagnostic Accuracy of Neonatal Assessment for Gestational Age ... · with ultrasound, the Dubowitz score dated 95% of pregnancies within ±2.6 weeks (n = 7 studies), while the Ballard

originally published online November 17, 2017; Pediatrics Rosner, Hannah Blencowe and Joy E. Lawn

Anne CC Lee, Pratik Panchal, Lian Folger, Hilary Whelan, Rachel Whelan, BernardA Systematic Review

Diagnostic Accuracy of Neonatal Assessment for Gestational Age Determination:

http://pediatrics.aappublications.org/content/early/2017/11/15/peds.2017-1423located on the World Wide Web at:

The online version of this article, along with updated information and services, is

http://pediatrics.aappublications.org/content/suppl/2017/11/15/peds.2017-1423.DCSupplementalData Supplement at:

1073-0397. ISSN:60007. Copyright © 2017 by the American Academy of Pediatrics. All rights reserved. Print

the American Academy of Pediatrics, 141 Northwest Point Boulevard, Elk Grove Village, Illinois,has been published continuously since 1948. Pediatrics is owned, published, and trademarked by Pediatrics is the official journal of the American Academy of Pediatrics. A monthly publication, it

by guest on May 26, 2019www.aappublications.org/newsDownloaded from