evolutionary dynamics of the human nadph oxidase genes cybb, cyba, ncf2, and ncf4: functional...

11
Article Evolutionary Dynamics of the Human NADPH Oxidase Genes CYBB, CYBA, NCF2, and NCF4: Functional Implications Eduardo Tarazona-Santos, y, * ,1,2 Moara Machado, y,2 Wagner C.S. Magalha ˜es, 2 Renee Chen, 1 Fernanda Lyon, 2 Laurie Burdett, 3,4 Andrew Crenshaw, 3,4 Cristina Fabbri, 5 Latife Pereira, 2 Laelia Pinto, 2 Rodrigo A.F. Redondo, 6 Ben Sestanovich, 1 Meredith Yeager, 3,4 and Stephen J. Chanock* ,1 1 Laboratory of Translational Genomics of the Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Gaithersburg, MD 2 Departamento de Biologia Geral, Instituto de Cie ˆncias Biolo ´gicas, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil 3 Intramural Research Support Program, SAIC Frederick, NCI-FCRDC, Frederick, MD 4 Core Genotype Facility, National Cancer Institute, National Institute of Health, Gaithersburg, MD 5 Dipartimento di Biologia Evoluzionistica Sperimentale, Universita ` di Bologna, Via Selmi, Bologna, Italy 6 Institute of Science and Technology - Austria, Am Campus 1, 3400 Klosterneuburg, Austria y These authors contributed equally to this work. *Corresponding author: E-mail: [email protected]; [email protected]. Associate Editor: Sarah Tishkoff Abstract The phagocyte NADPH oxidase catalyzes the reduction of O 2 to reactive oxygen species with microbicidal activity. It is composed of two membrane-spanning subunits, gp91-phox and p22-phox (encoded by CYBB and CYBA, respectively), and three cytoplasmic subunits, p40-phox, p47-phox, and p67-phox (encoded by NCF4, NCF1, and NCF2, respectively). Mutations in any of these genes can result in chronic granulomatous disease, a primary immunodeficiency characterized by recurrent infections. Using evolutionary mapping, we determined that episodes of adaptive natural selection have shaped the extracellular portion of gp91-phox during the evolution of mammals, which suggests that this region may have a function in host-pathogen interactions. On the basis of a resequencing analysis of approximately 35 kb of CYBB, CYBA, NCF2, and NCF4 in 102 ethnically diverse individuals (24 of African ancestry, 31 of European ancestry, 24 of Asian/ Oceanians, and 23 US Hispanics), we show that the pattern of CYBA diversity is compatible with balancing natural selection, perhaps mediated by catalase-positive pathogens. NCF2 in Asian populations shows a pattern of diversity characterized by a differentiated haplotype structure. Our study provides insight into the role of pathogen-driven natural selection in an innate immune pathway and sheds light on the role of CYBA in endothelial, nonphagocytic NADPH oxidases, which are relevant in the pathogenesis of cardiovascular and other complex diseases. Key words: innate immunity, immunogenetics, chronic granulomatous disease. Introduction The phagocyte NADPH oxidase, also known as the “respira- tory burst oxidase,” is an enzymatic complex that plays a critical role in innate immunity. Phagocyte NADPH oxidase catalyzes the reduction of oxygen to O 2 , generating reactive oxygen species (ROS) that are key components of phagocytic microbicidal activity (Heyworth et al. 2003). Phagocyte NADPH oxidase includes two membrane-spanning polypep- tide subunits, gp91-phox and p22-phox (encoded by CYBB and CYBA, respectively), and a set of cytoplasmic polypeptide subunits, p40-phox, p47-phox, and p67-phox, as well as a GTPase, either Rac1 or Rac2 (encoded by NCF4, NCF1, NCF2, and RAC1 or RAC2, respectively). Upon induction, the cytoplasmic subunits bind the transmembrane compo- nents and activate the enzymatic complex, producing ROS (fig. 1; Sumimoto et al. 2005). Mutations in CYBB, CYBA, NCF1, NCF2, or NCF4 can result in chronic granulomatous disease (CGD), a primary immunodeficiency. Most CGD patients have no measurable respiratory burst, and in less than 5% of patients, low levels of ROS production are noted (Heyworth et al. 2003). Approximately 70% of CGD cases are X-linked, owing to mutations in CYBB (Heyworth et al. 2003), and there is a high degree of allelic heterogeneity in X-linked as well as in autosomal forms of CGD, except for cases due to NCF1 mutations (see the Immunodeficiency Mutations Database: http://bioinf.uta.fi/base_root/muta tion_databases_list.php, last accessed July 16, 2013). NCF1 re- sides in a complex region of chromosome 7q11, and most CGD mutations result from gene conversion of the wild-type gene to one of several neighboring, highly paralogous pseu- dogenes (Chanock et al. 2000). Several studies in animal models and in vitro have con- firmed the long-standing clinical observation that the NADPH oxidase is critical for defense against catalase-positive bacteria and fungi (Buckley 2004). Association studies have suggested a role for common genetic variants in CGD genes as susceptibility alleles for tuberculosis and malaria (Bustamante Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution 2013. This work is written by US Government employees and is in the public domain in the US. Mol. Biol. Evol. 30(9):2157–2167 doi:10.1093/molbev/mst119 Advance Access publication July 2, 2013 2157 by guest on August 13, 2016 http://mbe.oxfordjournals.org/ Downloaded from

Upload: independent

Post on 09-Dec-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Article

Evolutionary Dynamics of the Human NADPH Oxidase GenesCYBB CYBA NCF2 and NCF4 Functional ImplicationsEduardo Tarazona-Santosy12 Moara Machadoy2 Wagner CS Magalhaes2 Renee Chen1

Fernanda Lyon2 Laurie Burdett34 Andrew Crenshaw34 Cristina Fabbri5 Latife Pereira2 Laelia Pinto2

Rodrigo AF Redondo6 Ben Sestanovich1 Meredith Yeager34 and Stephen J Chanock1

1Laboratory of Translational Genomics of the Division of Cancer Epidemiology and Genetics National Cancer Institute NationalInstitutes of Health Gaithersburg MD2Departamento de Biologia Geral Instituto de Ciencias Biologicas Universidade Federal de Minas Gerais Belo Horizonte Brazil3Intramural Research Support Program SAIC Frederick NCI-FCRDC Frederick MD4Core Genotype Facility National Cancer Institute National Institute of Health Gaithersburg MD5Dipartimento di Biologia Evoluzionistica Sperimentale Universita di Bologna Via Selmi Bologna Italy6Institute of Science and Technology - Austria Am Campus 1 3400 Klosterneuburg AustriayThese authors contributed equally to this work

Corresponding author E-mail edutarsicbufmgbr chanocksmailnihgov

Associate Editor Sarah Tishkoff

Abstract

The phagocyte NADPH oxidase catalyzes the reduction of O2 to reactive oxygen species with microbicidal activity It iscomposed of two membrane-spanning subunits gp91-phox and p22-phox (encoded by CYBB and CYBA respectively)and three cytoplasmic subunits p40-phox p47-phox and p67-phox (encoded by NCF4 NCF1 and NCF2 respectively)Mutations in any of these genes can result in chronic granulomatous disease a primary immunodeficiency characterizedby recurrent infections Using evolutionary mapping we determined that episodes of adaptive natural selection haveshaped the extracellular portion of gp91-phox during the evolution of mammals which suggests that this region mayhave a function in host-pathogen interactions On the basis of a resequencing analysis of approximately 35 kb of CYBBCYBA NCF2 and NCF4 in 102 ethnically diverse individuals (24 of African ancestry 31 of European ancestry 24 of AsianOceanians and 23 US Hispanics) we show that the pattern of CYBA diversity is compatible with balancing naturalselection perhaps mediated by catalase-positive pathogens NCF2 in Asian populations shows a pattern of diversitycharacterized by a differentiated haplotype structure Our study provides insight into the role of pathogen-driven naturalselection in an innate immune pathway and sheds light on the role of CYBA in endothelial nonphagocytic NADPHoxidases which are relevant in the pathogenesis of cardiovascular and other complex diseases

Key words innate immunity immunogenetics chronic granulomatous disease

IntroductionThe phagocyte NADPH oxidase also known as the ldquorespira-tory burst oxidaserdquo is an enzymatic complex that plays acritical role in innate immunity Phagocyte NADPH oxidasecatalyzes the reduction of oxygen to O2 generating reactiveoxygen species (ROS) that are key components of phagocyticmicrobicidal activity (Heyworth et al 2003) PhagocyteNADPH oxidase includes two membrane-spanning polypep-tide subunits gp91-phox and p22-phox (encoded by CYBBand CYBA respectively) and a set of cytoplasmic polypeptidesubunits p40-phox p47-phox and p67-phox as well as aGTPase either Rac1 or Rac2 (encoded by NCF4 NCF1NCF2 and RAC1 or RAC2 respectively) Upon inductionthe cytoplasmic subunits bind the transmembrane compo-nents and activate the enzymatic complex producing ROS(fig 1 Sumimoto et al 2005) Mutations in CYBB CYBA NCF1NCF2 or NCF4 can result in chronic granulomatous disease(CGD) a primary immunodeficiency Most CGD patients

have no measurable respiratory burst and in less than 5of patients low levels of ROS production are noted(Heyworth et al 2003) Approximately 70 of CGD casesare X-linked owing to mutations in CYBB (Heyworth et al2003) and there is a high degree of allelic heterogeneity inX-linked as well as in autosomal forms of CGD except forcases due to NCF1 mutations (see the ImmunodeficiencyMutations Database httpbioinfutafibase_rootmutation_databases_listphp last accessed July 16 2013) NCF1 re-sides in a complex region of chromosome 7q11 and mostCGD mutations result from gene conversion of the wild-typegene to one of several neighboring highly paralogous pseu-dogenes (Chanock et al 2000)

Several studies in animal models and in vitro have con-firmed the long-standing clinical observation that theNADPH oxidase is critical for defense against catalase-positivebacteria and fungi (Buckley 2004) Association studies havesuggested a role for common genetic variants in CGD genes assusceptibility alleles for tuberculosis and malaria (Bustamante

Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution 2013 This work is written by US Government employees and is inthe public domain in the US

Mol Biol Evol 30(9)2157ndash2167 doi101093molbevmst119 Advance Access publication July 2 2013 2157

by guest on August 13 2016

httpmbeoxfordjournalsorg

Dow

nloaded from

et al 2011) as well as for immune related diseases such asCrohnrsquos disease and lupus as identified in genome-wide as-sociation studies (GWAS) in European populations (Riouxet al 2007 Roberts et al 2008 Jacob et al 2012) Besidesthe phagocyte NADPH oxidase other NADPH oxidaseswith different functions are expressed in a variety ofnonphagocytic cells including the endothelium and havebeen implicated in cardiovascular and renal diseaseAlthough p22-phox (encoded by CYBA) is a protein compo-nent shared by several of these NADPH oxidases (also calledNox) other more specific protein subunits are encoded bydifferent Nox genes homologous to the genes coding for thephagocytic subunits (Sumimoto et al 2005 San Jose et al2008) Although these nonphagocytic NADPH oxidases nor-mally produce less O2 even small imbalances in ROS levelsmay cause tissue damage due to oxidative stress which iscorrelated with the pathogenesis of gout chronic obstructivepulmonary disease rheumatoid arthritis and cardiovasculardiseases (Brandes and Kreuzer 2005) Therefore variants inNADPH oxidase genes may have pleiotropic effects across aspectrum of disorders (Santiago et al 2012)

Despite the involvement of the NADPH oxidase in a rangeof clinically relevant phenotypes our knowledge of the se-quence diversity of NADPH genes mostly derives from CGDpatients Although targeted SNP genotyping has been per-formed in the context of association studies for CYBA(Bedard et al 2009) and NCF4 (Olsson et al 2007) none ofthe large-scale resequencing efforts such as Seattle SNPs(httppgagswashingtonedu last accessed July 16 2013)Innate Immunity PGA (httpwwwpharmgatorgIIPGA2index_html last accessed July 16 2013) and the CornellndashCelera initiative (Bustamante et al 2005) have included theNADPH oxidase genes and the coverage of these genes for thecurrent release of the 1000 Genomes Project remains low formost of the studied individuals (1000 Genomes ProjectConsortium et al 2012 average coverage and their standard

deviations on May 2013 are CYBB 40 plusmn 22 CYBA 35 plusmn 20NCF2 49 plusmn 27 and NCF4 49 plusmn 27) Although GWAS haveidentified common variants that contribute to complex phe-notypes a component of missing heritability of common dis-eases due to rare variants that are detectable only byresequencing is emerging In this study we analyzed the pat-tern of sequence diversity of four of the NADPH genes (CYBBCYBA NCF2 and NCF4) between mammalian species and inhuman populations by resequencing these genes in 102 eth-nically diverse individuals We interpreted our results in termsof evolutionary histories by addressing the action of naturalselection and focusing on two temporal scales mammalianevolution and recent human evolution We excluded NCF1from our study because its high homology with its pseudo-genes prevents reliable sequencing in individual samples(Chanock et al 2000) Several studies have shown the impor-tance of natural selection on the evolution of immunity genesat both the interspecific (Kosiol et al 2008) and populationlevels (Ferrer-Admetlla et al 2008 Barreiro et al 2009 Barreiroand Quintana-Murci 2010) By definition variants under nat-ural selection are associated with different reproductive effi-ciencies (fitness) of their carriers and contribute to phenotypevariability therefore they may be biomedically relevant byinfluencing the susceptibility to rare or common diseasesThe goals of this study are as follows 1) to determine whetherthe pattern of diversity of human phagocyte NADPH genesreflects the action of different types of natural selection 2) toelucidate the evolutionary dynamics of NADPH genes at thetemporal scales of mammals and humans and 3) to under-stand the biomedical implications of this evolutionary processin human populations

Results

Molecular Evolution of NADPH Genes alongMammalian Phylogeny

We examined signatures of natural selection across thecoding regions of NADPH genes by analyzing sequencesfrom the complete genomes of 29 mammals listed in theEntrez and Ensembl databases (Lindblad-Toh et al 2011one sequence for each species see supplementary materialSupplementary Material online for details) and comparingthe amount of nonsynonymous and synonymous substitu-tions (Nielsen et al 2005) When comparing a set of homol-ogous sequences from different species most of the observeddifferences are fixed that is the differences are monomorphicwithin a species because enough time has passed for theobserved variant to appear increase its frequency and reacha frequency of 1 (Kimura 1974) We compared the number offixed synonymous substitutions (dS assumed to be neutral)and fixed nonsynonymous substitutions (dN for which wetest the hypothesis of natural selection) between speciesusing the parameter = dNdS which is informative of theaction of natural selection at the inter-specific level (Yang2007a) Under neutral evolution of nonsynonymous substitu-tions these substitutions fix at the same rate as synonymoussubstitutions and therefore dN amp dS and amp 1 If nonsyn-onymous substitutions tend to be deleterious purifying

FIG 1 Components of the phagocyte NADPH oxidase Representationof the inactivated (left) and activated (right) forms of the phagocyteNADPH oxidase components reproduced from Heyworth et al (2003)The activated form is responsible for the respiratory burst The proteins(and genes) are gp91 (CYBB Xp211) p22 (CYBA 16q24) p67 (NCF21q25) p40 (NCF4 22q131) and p47 (NCF1 7q1123)

2158

Tarazona-Santos et al doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

selection maintains the substitutions at low frequencies andprevents fixation at the same rate as synonymous substitu-tions resulting in dNlt dS and lt 1 On the other hand ifepisodes of positive natural selection (that raise the frequencyof beneficial variants) are frequent nonsynonymous substi-tutions increase in frequency and fix more rapidly than neu-tral synonymous substitutions thus dNgt dS and gt 1 Weused the maximum likelihood framework developed by Yang(2007a) to estimate for the NADPH oxidase genes under avariety of evolutionary models as implemented in the PAMLsoftware (Yang 2007b) This approach allows inferences aboutthe evolution of a coding region along an interspecific phy-logeny and maps the codons that have evolved under strongweak purifying selection neutrality or adaptive positive se-lection (see supplementary material Supplementary Materialonline for details)

In general PAML evolutionary models that allow a combi-nation of purifying selection and neutrality are reasonably re-alistic These models are nested with respect to models thatalso incorporate positive selection at the cost of adding newparameters We evaluated the improvements in the goodnessof fit of the data using the latter model with respect to theformer models by applying a likelihood ratio test (LRT) Afterfitting the data to the most appropriate evolutionary modelNaive Empirical Bayes (NEB) or Bayes Empirical Bayes (BEB)approaches were used to infer theparameter for each codon

For the 29 species of mammals considered we filteredbased on quality control (supplementary table S1 Supple-mentary Material online) and analyzed 570 codons of CYBBin 26 species 198 codons of CYBA in 16 species 526 codons ofNCF2 in 23 species and 339 codons of NCF4 in 20 species (seesupplementary material Supplementary Material online forfurther details including the species and sequences used forthe analyses the parameter estimations for the differentmodels and the LRT results) Here we only present the resultsof model M3 of Yang (2007a) with three (K = 3) classes of (fig 2) This model allows for different classes including thepossibility of positive selection and is a reasonable way ofpresenting the results for the four genes under the samemodel Moreover in all cases the data fit better withmodel M3 than the nested and simpler M0 or M2 modelspresented by Yang (2007a)

The results of this analysis for CYBB CYBA NCF2 and NCF4are presented in figure 2 which shows the type of naturalselection (ie based on the estimated ) for each codon thatmost likely predominated during mammalian evolution Forthis temporal scale CYBA NCF2 and NCF4 coding regionshave evolved driven by a combination of different levels ofpurifying natural selection Overall the average and standarddeviation values for these genes are NCF2 = 0256 plusmn 0227NCF4 = 0126 plusmn 0140 and CYBA = 0109 plusmn 0116

Our most striking result is for CYBB which presents a widespectrum of mutations that account for gt70 of CGD pa-tients Although we would predict that purifying selection ongenes involved in Mendelian diseases (Blekhman et al 2008)would yield similar results for CYBB and other NADPHcomponents we observed a different pattern In generalCYBB is a conserved gene but 6 of its codons show

evidence of positive natural selection (supplementary tableS2 Supplementary Material online fig 2) In a genome-widesurvey performed by Kosiol et al (2008) they have reportedCYBB as a gene showing a signal of positive natural selectionMore importantly by evolutionary mapping we show herefor the first time that most of these positive selection eventsmap to the small extracellular portion of this protein (fig 3)The proximity of these inferred episodes of positive naturalselection to glycosylation sites in gp91 is noteworthy consid-ering the importance of the glycome in immunity (Marth andGrewal 2008)

Population Genetics of NADPH Genes

We sequenced CYBB CYBA NCF2 and NCF4 in a publiclyavailable panel that includes 24 individuals of African ances-try 31 Europeans 24 AsianOceanians and 23 admixed LatinAmericans (ie Hispanics) This panel is a suboptimal repre-sentation of the worldwide population a limitation that iscommon to most human genomic diversity projects focusedon SNP genotyping or resequencing efforts However basedon how human genetic diversity is apportioned within(gt85) and between (lt15) populations (Lewontin 19721000 Genomes Project Consortium 2010) even studies usingsuboptimal sampling are informative about the genetic struc-ture of human populations and serve to critically identify therole of evolutionary factors in human genetic diversity(Kimura 1974 Nielsen et al 2005 1000 Genomes ProjectConsortium 2010) All the raw results are available as supple-mentary material Supplementary Material online and at theSNP500Cancer project homepage (httpvariantgpsncinihgovcgfseqpagessnp500do last accessed July 16 2013) orcan be downloaded from the DIVERGENOME platform(Magalhaes et al 2012 httpwwwpggeneticaicbufmgbrdivergenome last accessed July 16 2013)

To ascertain which combination of evolutionary factorshas shaped the diversity of NADPH genes we assessed thepattern of nonsynonymous and synonymous polymor-phisms as well as intra- and interpopulation diversity forNADPH genes and tested the null hypothesis of neutralitythat patterns of diversity may be explained by consideringonly the demographic history of human populations and themutation and recombination rates of each locus

Nonsynonymous polymorphisms are underrepresentedin the human genome and usually occur at low frequencieswhen present reflecting the action of purifying natural selec-tion (1000 Genomes Project Consortium 2010) By resequen-cing Tarazona-Santos et al (2008) did not observe commonnonsynonymous polymorphisms for CYBB This result is con-sistent with purifying natural selection acting on X-chromo-some genes due to the exposure of deleterious recessivemutations to natural selection in hemizygous males Thussubstitutions in the coding region of CYBB should be rarein human populations and are seldom captured by studieswith small sample sizes Interestingly the lack of CYBBnonsynonymous polymorphisms in our sample of humanpopulations contrasts with the recurrent episodes of positiveselection of the extracellular portion of gp91 during

2159

Evolutionary Dynamics of Human NADPH Oxidase Genes doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

mammalian evolution as inferred in this study For the auto-somal NADPH oxidase components (table 1 and haplotypetables online Supplementary Material online) we observe inthis study two rare and conservative nonsynonymous substi-tutions (ie involving amino acids with similar chemical prop-erties T85N and A304E) in NCF4 But for NCF2 and CYBA weobserved patterns of nonsynonymous substitutions thatseldom occur in human genes NCF2 has nine nonsynony-mous substitutions three of them are common (with a fre-quency higher than 5 in at least one of the studiedpopulation samples) and six are rare On the other handtwo nonsynonymous substitutions in CYBA are commonand ubiquitous in human populations namely Y72H(rs4673) and V174A (rs1049254 in a position where variationamong mammalian species is also observed) Moreover thefollowing two of the five common amino acid changes ob-served in the autosomal NADPH genes are predicted to bepossibly damaging (ie radical) by the Polyphen resource(Ramensky et al 2002) the two changes are R395W inHispanic NCF2 (rs13306575) and Y72H (rs4673) in CYBA Ingeneral Polyphen accurately predicts the effect of nonsynon-ymous substitutions based on biochemical and evolutionarydata (Williamson et al 2005) Notably the ImmunodeficiencyMutations Database (httpbioinfutafiNCF2basecontent=pubIDbases last accessed July 16 2013) reports one395W395W autosomal recessive CGD patient but we andthe HapMap project (wwwhapmaporg last accessed July 16

2013) observed the W allele at frequencies between 5 and10 in Asians and admixed Latin Americans including onesupposedly healthy 395W395W Japanese HapMap individ-ual On the basis of these results we verified whether NativeAmericans who descend from an ancestral Pleistocene Asianpopulations that peopled the Americas by the Behring Straitsmore than 14000 years ago may have relatively higher fre-quencies of this variant We genotyped the variant 395Wusing a Taqman assay in 558 Native Americans (see supple-mentary table S3 Supplementary Material online for detailedresults) and observed an allele frequency of 12 being thisvariant always present in heterozygous individuals

For the four studied genes the diversity and levels of re-combination are higher in Africans than in non-Africans(table 2 and haplotype tables available as supplementaryfiles Supplementary Material online) This result is a conse-quence of the African origin of modern humans and the ldquooutof Africardquo migration that occurred 40000ndash80000 years agoafter a bottleneck leading to the peopling of other continents(Campbell and Tishkoff 2008 Laval et al 2010) Therefore thefirst divergence between continental human populations wasbetween Africans and ancestral non-Africans Consistentlywith this scenario we observed the highest between-popula-tion differentiation for CYBB CYBA and NCF4 between thesetwo groups Interestingly NCF2 does not match this pattern(table 3)

FIG 2 Inferred types of natural selection for codons of the NADPH genes at the evolutionary time scale of mammals Codons are represented along thehorizontal axis For each gene three classes of sites (black dark gray and light gray) are considered and each class evolved under different inferred values (presented for each gene in the figure at the top of each graphic) These classes correspond to the model M3 of Yang (2007a) with three classes ofsites Given our data this model is more likely than alternative models of evolution that assume simpler scenarios such as a unique for the entire gene(see supplementary material Supplementary Material online for details regarding methods and results using alternative models) The three classescorrespond to different types and levels of natural selection from strong purifying selection (in the lightest gray) to positive selection (gt 1) For eachcodon the probability of belonging to each of the three classes of corresponds to the height of the corresponding color in the vertical bar Forexample codon 173 of CYBB (indicated by a white arrow) has a 0000 probability of belonging to the= 00095 class (light gray a class corresponding tostrong purifying selection) a 0169 probability of belonging to the = 03085 class (dark gray) and a 0831 probability of belonging to the = 18987class (black a class that suggests positive selection) In this case reasonable evidence of positive selection on this codon exists

2160

Tarazona-Santos et al doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

FIG

3

Nat

ural

sele

ctio

nm

app

ing

acro

ssC

YBB

(en

codi

ng

gp91

)al

ong

mam

mal

ian

evol

utio

na

sid

enti

fied

usin

gth

ePA

ML

met

hod

byY

ang

(200

7a)

The

top

olog

ies

ofgp

91an

dp2

2ar

ere

pro

duce

dfr

omT

aylo

ret

al(

2004

Cop

yrig

ht20

04T

heA

mer

ican

Ass

ocia

tion

ofIm

mun

olog

ists

In

cU

sed

wit

hp

erm

issi

on)

Dar

kgr

ayam

ino

acid

sha

veev

olve

dun

der

pos

itiv

ese

lect

ion

wit

hgt

80

pro

babi

lity

Mos

tof

thes

eam

ino

acid

sar

ein

the

extr

acel

lula

rp

orti

onof

the

pro

tein

The

upp

erp

art

ofth

efig

ure

show

sth

ep

rote

inal

ign

men

tfo

rn

ine

mam

mal

sof

the

gp91

regi

onin

dica

ted

byth

ebl

ack

ellip

seI

nth

isre

gion

ahi

ghle

velo

fam

ino

acid

varia

tion

isfo

und

betw

een

spec

ies

and

seve

ralc

odon

ssh

owgt

1In

this

alig

nm

ent

gray

vert

ical

bars

corr

esp

ond

tova

riab

leam

ino

acid

site

sT

hep

rote

inal

ign

men

tof

mam

mal

ssh

ows

the

follo

win

gsp

ecie

sH

om(H

omo

sapi

ens

sapi

ens)

Pan

(Pan

trog

lody

tes)

Mac

(Mac

aca

mul

atta

)M

us(M

usm

uscu

lus)

Rat

(Rat

tus

norv

egic

us)

Cri

(Cri

cetu

lus

gris

eus)

Het

(Het

eroc

epha

lus

glab

er)

Cav

(Cav

iapo

rcel

lus)

an

dO

ry(O

ryct

olag

uscu

nicu

lus)

EC

ext

race

lula

ren

viro

nm

ent

TM

tra

nsm

embr

ane

laye

ran

dIC

in

trac

ellu

lar

envi

ron

men

t

2161

Evolutionary Dynamics of Human NADPH Oxidase Genes doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

From the four studied genes NCF4 which encodes theregulatory protein p40-phox shows a pattern of diversitythat is typical for a gene that has evolved under neutralityIn addition to the features described in the previous para-graph the allelic spectra of NCF4 in the studied populationsare consistent with a neutral model of evolution (tables 2ndash4)

Although CYBB presented the most interesting evolution-ary history at the interspecific level with repeated episodes ofpositive natural selection recent human evolutionary historyhas resulted in interesting patterns of variation for CYBA andNCF2 CYBA encodes p22-phox which is a transmembraneprotein shared by different NADPH oxidases In addition toharboring two common nonsynonymous polymorphisms(V174A is also variable among different species of mammals)CYBA is the most variable and most affected by recombina-tion among the NADPH oxidase genes (tables 1 and 2) Inparticular CYBA diversity is very high in Europe comparedwith 329 genes resequenced in a European sample (httppgagswashingtonedusummary_statshtml last accessed July16 2013) CYBA ranks 11th (ie the 97th percentile)Moreover there are contrasting proportions of the totalnumber of polymorphismsnumber of singletons betweenAfricans (a low proportion) and Europeans (a high propor-tion) the latter showing an excess of common polymor-phisms with respect to the neutral expectation (see the DFL

test in table 4 and supplementary table S4 SupplementaryMaterial online) This excess of common variants inEuropeans is also significant when we conservatively testedit against a scenario of human evolution that incorporates theldquoOut of Africardquo bottleneck (Laval et al 2010) and the observedlevel of recombination in Europeans (CYBA = 807 for the se-quenced region) Because demographic forces and recombi-nation levels alone do not explain the high CYBA diversity andits excess of common polymorphisms we suggest that bal-ancing natural selection (that acts by maintaining differentalleles at high frequency in a population) has contributed to

shape the diversity of CYBA at least in the European popu-lation Indeed figure 4a shows that the haplotype network ofCYBA for the European population is consistent with theaction of balancing natural selection (Bamshad andWooding 2003) showing two well-differentiated commonclades that explain the observed high diversity and theexcess of common CYBA variants A comparative genomicanalysis confirms this inference the ratio of polymorphismsto differences fixed between human and chimpanzee is nothomogeneous along the gene in the different human popu-lations (Mc Donald 1998 supplementary table S5Supplementary Material online) as would be expectedunder neutral evolution Our inference of balancing naturalselection is consistent with the fact that 25ndash30 of the var-iation in levels of ROS production can be attributed to geneticfactors (Lacy et al 2000) and that ROS levels are associatedwith CYBA variants (Bedard et al 2009)

p67 encoded by NCF2 is a necessary cytosolic NADPHcomponent for phagocyte ROS production Asians show ahighly differentiated NCF2 haplotype structure (see frequen-cies of haplotypes NCF2-D11 and NCF2-E10 in the haplotypetables online and in the network shown in fig 4b) and thehighest FST values are observed in pairwise comparisons be-tween Asians and non-Asian populations (in particular withEuropeans table 3) and not between Africans and non-African populations as is usually observed in the humangenome We confirmed these results by analyzing data forNCF2 from the HapMap Project (supplementary materialSupplementary Material online) Moreover a trend towardan excess of rare polymorphisms exists in Asians that is notobserved elsewhere (tables 1 2 and 4 DFL =1904FFL =1893) Although we cannot exclude that this patternof diversity is compatible with the null hypotheses of neutral-ity and with the tested demographic history of human pop-ulations inferred by Laval et al (2010 tables 2ndash4) we canspeculate and envisage four additional evolutionary scenarios

Table 1 Allele Frequencies of Nonsynonymous Polymorphisms in NADPH Oxidase Genes

Genes rs Minor Allele(Amino Acid)

PolyphenPrediction

African European Asian Hispanic

CYBA

Y72H rs4673 T (Y) Possibly damaging 046 032 017 022

V174A rs1049254 T (V) Benign 017 048 048 018

NCF2

K181R rs2274064 G (R) Benign 035 037 041 048

T279M rs13306581 T (T) Probably damaging 000 000 005 000

V297A rs35937854 C (A) Benign 004 000 000 000

T361S Chr1181799289 NCBI36hg18 T (S) mdash 000 000 002 000

H389Q rs17849502 A (Q) Benign 000 005 000 007

R395W rs13306575 T (W) Possibly damaging 000 000 000 007

N419I rs35012521 T (I) Probably damaging 000 002 004 000

P454S rs55761650 T (S) mdash 000 000 000 002

L487S Chr1181795862 NCBI36hg18 C (S) mdash 002 000 000 000

NCF4

T85N rs112306225 A (N) mdash 000 000 002 000

A304E rs5995361 A (E) Benign 004 000 000 000

2162

Tarazona-Santos et al doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

that may have contributed to shape the pattern of NCF2diversity in Asians 1) the observed trend is suggestive of aselective sweep on NCF2 standing variation a neutral orweakly deleterious existing variant becomes beneficial andrapidly increases in frequency (together with its associatedhaplotypes ie incomplete sweep) reducing the nucleotidediversity in the surrounding region and rendering other stand-ing substitutions rare During this process new rare substitu-tions appear in the expanding positively selected haplotype2) The pattern of diversity of NCF2 in Asia may result from anincomplete selective sweep acting on ARPC5 which is locatedapproximately 35 kb downstream of NCF2 In a genome-widescan for recent positive selection Voight et al (2006) identi-fied a strong signature of an incomplete sweep for ARPC5 inAsia (P = 0009) characterized by a higher than expected long-range linkage disequilibrium summarized by very high iHS

statistics (see Haplotter results for the HapMap II data athttphaplotteruchicagoedu last accessed July 16 2013)SNPs in NCF2 also presents high iHS statistics (P = 002) al-though values are lower than for ARPC5 3) The differentiatedpattern of diversity of NCF2 in Asia may also have been gen-erated without the action of natural selection during the firstcolonization of Asia by modern humans In a process of geo-graphic population expansion specifically in the front wave ofthe expansion some rare alleleshaplotypes (ie surfing al-leles) may become common by chance mimicking the pat-tern of diversity generated by a selective sweep (Excoffier andRay 2008) 4) The excess of rare variants may be an artifact ofpooling individuals from different populations (Ptak andPrzeworski 2002) Consistent with evolutionary scenarios 1ndash4 that produce similar patterns of diversity the haplotypenetwork of NCF2 for Eurasians (fig 4b) shows the following1) a large differentiation between Asians and Europeans thatis compatible with the high observed FST values and 2) a star-like shape associated with the haplotype NCF2-E10 that iscommon in Asia and rare elsewhere which is compatiblewith the excess of rare alleles in the Asian populations

DiscussionBy analyzing 29 mammalian genomes and four human pop-ulations we show in this study that natural selection hasacted in different ways over time to shape the pattern ofdiversity of the phagocyte NADPH oxidase genes At the tem-poral scale of the evolution of mammals we have inferredrecurrent episodes of positive selection acting on the extra-cellular portion of gp-91 that have been important to shapethe pattern of interspecific diversity of this gene Our in-terspecific analyses did not show a similar pattern of naturalselection in any of the other phagocyte NADPH oxidasegenes Even if current knowledge on the biology of NADPHdoes not allow us to interpret our results in terms of functionwe propose that the extracellular region of gp-91 is function-ally relevant Our results also imply that this region is highlydifferentiated among mammals at the protein level and thisvariability should be considered when mammals models areused to study the structure and function of phagocyteNADPH components

In the time scale of human evolution our analyses of theNADPH oxidase genes suggest that CYBA has been a target ofbalancing natural selection Because we do not have evidenceof population-specific variants that faced selective pressurethe inferred natural selection may have acted on a standingvariation in ancestral populations This implies that the selec-tive pressure began after the appearance of the variant andpossibly acted in a specific geographic region (Barret andSchluter 2008) The signatures of natural selection acting ona new mutation and on standing variation differ In the case ofselective sweeps episodes of natural selection on standingvariation are associated to a larger variance in the allelic spec-trum with respect to natural selection on a new mutationAlso selection on standing variation may produce an excess ofalleles at intermediate frequencies that is not associated withhigh nucleotide diversity (Przeworski et al 2005 Peter et al2012) This pattern contrasts with the effect of balancing

Table 2 Intrapopulation Diversity Indexes in the Studied Populationsfor the NADPH Oxidase Genes Obtained from Resequencing Dataa

African European Asian Hispanic

Number of chromosomes

CYBBb 42 52 34 36

CYBA 48 62 48 46

NCF2 48 62 48 46

NCF4 48 62 48 46

Segregating sitessingletons

CYBB 218 70 103 135

CYBA 6122 333 335 347

NCF2 4613 3311 2816 3714

NCF4 4512 267 191 308

Haplotype structureNumber of inferred haplotypesc

CYBB 14 5 7 12

CYBA 39 39 32 26

NCF2 38 32 18 31

NCF4 36 30 17 22

Haplotype diversity plusmn SD

CYBB 088 plusmn 003 034 plusmn 008 053 plusmn 010 070 plusmn 008

CYBA 098 plusmn 001 098 plusmn 001 097 plusmn 000 096 plusmn 002

NCF2 099 plusmn 001 096 plusmn 001 087 plusmn 004 097 plusmn 001

NCF4 098 plusmn 001 093 plusmn 002 093 plusmn 002 093 plusmn 002

Recombination parameter (q 103 per site)c

CYBB 008 lt001 lt001 lt002

CYBA 291 148 096 088

NCF2 052 038 010 058

NCF4 137 103 039 022

h estimatorsd

p 103 per site

CYBB 036 012 015 028

CYBA 190 163 151 157

NCF2 081 047 043 057

NCF4 101 076 064 084

hW 103 per site

CYBB 042 013 021 027

CYBA 231 118 125 13

NCF2 102 069 062 083

NCF4 101 070 054 087

aMost analyses were performed using software DnaSP (Rozas 2009)bData for CYBB (Xp211) are from Tarazona-Santos et al (2008)cHaplotypes and inferred using the method by Stephens and Scheet (2005) andthe software PHASEd Tajima (1983) W Watterson (1975)

2163

Evolutionary Dynamics of Human NADPH Oxidase Genes doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

natural selection which produces an excess of common allelesassociated with high genetic diversity Thus the observed pat-tern of CYBA diversity in Europeans is not consistent with aselective sweep on a standing variation but it is consistent witha scenario of balancing selection acting on standing variation

If we consider for CYBA that heterozygote advantage maybe the mechanisms of balancing selection we can speculatethat the biological basis for this mechanism may be the fol-lowing considering that p22-phox is not exclusive of thephagocyte NADPH oxidase but it is also part of Nox com-plexes expressed in other tissues the dependence of ROSproduction on CYBA variants has to be finely regulated IfCYBA variants induce high levels of ROS these variants mayfavor a phagocyte-dependent efficient response to pathogensbut may damage other endothelial tissues Alternativelytissue oxidative damage does not occur if ROS productionis low but this response may be associated with a weaker

phagocyte respiratory burst against pathogens In this con-text heterozygote individuals with a CYBA-dependent inter-mediate level of ROS production may have been favored bynatural selection

Our results contribute to the discussion regarding the rel-evance of balancing selection in shaping the diversity ofinnate immunity genes (Ferrer-Admetlla et al 2008 Barreiroet al 2009) Ferrer-Admetlla et al (2008) have associated therecurrent signatures of balancing selection on inflammatorygenes with the need for fine regulation CYBA is the onlyphagocytic NADPH oxidase gene that also encodesnonphagocyte Nox components thus CYBA has a role inROS cell signaling a potentially dangerous process due toits capability to produce oxidative damage to tissues Geneswith these characteristics likely need even tighter regulationThe interplay between pathogen-driven selective pressureon innate immunity genes and their concomitantnonimmunological functions is complex In addition toCYBA other interesting examples of this interplay can befound among the 10 human toll-like receptors (TLRs) thatshow a variety of signatures of natural selection TLR8 whichshows the strongest signature of purifying selection is alsoinvolved in neuronal development (Barreiro et al 2009) and itis difficult to discriminate the role of each function in deter-mining the observed signature of natural selection

With few exceptions the pathogens responsible for naturalselection on immune genes are difficult to specify In the caseof NADPH oxidase we can infer based on the spectrum ofinfections in CGD patients that catalase-positive bacteria andfungi such as Staphylococci Salmonella Candida Aspergillusand M tuberculosis may be the selective pathogensInteractions between the host and pathogens also includemechanisms of the latter to impair the respiratory burst ofthe former For example Leishmania donovani blocks the as-sembly of NADPH oxidase at the phagosome membrane(Lodge et al 2006) These mechanisms may constitute selec-tive pressures imposed by pathogens

The associations reported in GWAS between rs4821544 inNCF4 and Crohnrsquos disease (an idiopathic inflammatory boweldisease that predominantly involves the ileum and colonRioux et al 2007) and between rs10911363 in NCF2 and

Table 3 Pairwise FST Genetic Distances between Populations

CYBBa CYBA

Africa Europe Asia Hispanic Africa Europe Asia Hispanic

Africa mdash 0316 0264 0092 mdash 0074 0083 0065

Europe 0257 mdash 0000 0107 mdash 0002 0056

Asia 0211 0000 mdash 0073 mdash 0054

Hispanic 0070 0082 0056 mdash mdash

NCF2 NCF4

Africa mdash 0048 0058 0037 mdash 0128 0136 0158

Europe mdash 0069 0000 mdash 0026 0026

Asia mdash 0059 mdash 0005

Hispanic mdash mdash

aFor CYBB FST estimators are above the diagonal Below the diagonal are the FST values corrected as if the effective population sizes of X chromosome genes were equal toautosomal ones

Table 4 Results of Neutrality Tests for the NADPH Oxidase Genesand Their Significancea

African European Asian Hispanic

Tajimarsquos D

CYBB 0473 0274 0813 0084

CYBA 0580 1242 0684 0707

NCF2 0412 0939 0883 0987

NCF4 0750 0243 0556 0102

Fu and Lirsquos D

CYBB 1050 1110 0395 0977

CYBA 1485 1734 1188 0138

NCF2 0407 1105 1904 1398

NCF4 0308 0443 1893 0266

Fu and Lirsquos F

CYBB 0980 0811 0114 0814

CYBA 1382 1893 1205 0405

NCF2 0512 1252 1893 1567

NCF4 0557 0231 1225 0248

NOTEmdashUnderlined values represent significant results under the demographic modelinferred by Laval et al (2010) for human populations See details in supplementarytable S4 Supplementary Material onlineaThe McDonaldndashKreitman test is nonsignificant in any of the cases

Significant under the WrightndashFisher model of constant population size

2164

Tarazona-Santos et al doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

systemic lupus erythematosus (Cunninghame Graham et al2011) confirm the involvement of NADPH genes in the path-ogenesis of inflammatory-related common diseases Ourclaim that natural selection acted on CYBA (and maybe inNCF2) is relevant for biomedical studies because combiningevidence of natural selection with association analyses inimmune genes increases the statistical power to detect dis-ease-associated variants (Ayodo et al 2007) As a new gener-ation of association studies focusing on rare variation isemerging the combination of genes deemed interestingfrom GWAS and populations with an excess of rare variantsin these genes such as NCF2 in Asians are particularly inter-esting as a source of rare variants with clinical relevanceFinally by determining through molecular evolution mappingthat the extracellular portion of gp91 (encoded by CYBB inthe X-chromosome) has been subject to recurrent episodes ofpositive selection at the scale of mammals evolution we positthe hypothesis that this portion of the NADPH oxidase isrelevant for currently unknown biological processes thatonce revealed by structural and functional investigationswill contribute to understanding the role of NADPH oxidasein infectious autoimmune and cardiovascular diseases

Materials and Methods

Molecular Evolution Analysis of NADPH Genes

We used the maximum likelihood framework developed byYang (2007a) to estimate for the NADPH oxidase genes

under a variety of evolutionary models as implemented in thePAML software (Yang 2007b) This approach allows infer-ences about the evolution of a coding region along aninter-specific phylogeny mapping the codons that haveevolved under strongweak purifying selection neutrality oradaptive positive selection Further details about these anal-yses are available as supplementary material SupplementaryMaterial online

Human Population Genetics of NADPH Genes

For human population genetics analyses we conducted bidi-rectional Sanger sequencing of CYBB CYBA NCF2 and NCF4for a total of 35242 bp for each of 102 healthy individuals aspart of the SNP500 Cancer project (Packer et al 2006 seesupplementary fig S1 Supplementary Material online for de-tails) Human population resequencing data for CYBB werepublished in Tarazona-Santos et al (2008) The resequencingexperiments were performed as in Packer et al (2006) These102 unrelated individuals include the following 24 of Africanancestry (15 African Americans from the United States and 9Pygmies) 23 admixed Latin Americans (from Mexico PuertoRico and South America) 31 Europeans (from the CEPHUTAH pedigree and the NIEHS Environmental GenomeProject) and 24 AsiansndashOceanians (from Melanesia PakistanChina Cambodia Japan and Taiwan)

After controlling for multiple tests we confirmed that allSNPs fit the HardyndashWeinberg equilibrium by the Guo and

CYBA - A16

CYBA - A3

Y72H - rs4673

V174A - rs1049254 CYBA - E18

(a)

NCF2 - E10 H389Q - rs17849502

NCF2 - B3

NCF2 - D11

NCF2 - C1K181R - rs2274064(b)

FIG 4 Phylogenetic networks of (a) CYBA in Europeans and (b) NCF2 in Europeans (black) and Asians (yellow) The lengths of the branches areproportional to the number of mutations Only nonsynonymous mutations are shown The haplotype names correspond to the table of inferredhaplotypes in the supplementary material Supplementary Material online We only show the names of haplotypes with a frequency gt5 Ancestralhaplotypes (inferred as being the human haplotype or median vector most similar to the chimpanzee sequence) are indicated by an arrow Medianvectors are in red

2165

Evolutionary Dynamics of Human NADPH Oxidase Genes doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

Thompson (1992) test which was implemented in the soft-ware Arlequin 30 (Excoffier et al 2007) Insertion-deletions(INDELs) were excluded from population genetics analysesTo assess intrapopulation diversity we used two statistics the per-site mean number of pairwise differences betweensequences (Tajima 1983) and w based on the number ofsegregating sites (S) (Watterson 1975) We measured pairwisebetween-populations diversity by using the FST statistics cal-culated using the software DnaSP (Rozas 2009)

Haplotypes and the recombination parameter were in-ferred using the PHASE software (Stephens and Scheet 2005)and diversity indexes calculations (tables 2 and 3) as well asneutrality statistics (table 4) were estimated using DnaSP soft-ware We applied two kinds of neutrality tests 1) tests basedon the allelic spectrum which is the distribution of polymor-phisms across different classes of frequencies namelyTajimarsquos DT (Tajima 1989) and FundashLirsquos DFL and FFL (Fu andLi 1993) and 2) tests based on comparisons between thenumber of polymorphisms in human populations and fixeddifferences with the chimpanzee (ie outgroup) namely theMcDonald and Kreitman (1991) test and the adaptedKolmogorovndashSmirnoff test by McDonald (1998) For thefirst set of tests we used as null hypotheses both the classicWrightndashFisher model of neutrality with a constant popula-tion size as well as the more realistic evolutionary scenario forhuman populations inferred by Laval et al (2010) In the caseof the scenario of Laval et al (2010) we ignored interconti-nental gene flow within the Old World because these raregene flow events likely does not affect the level of significanceof the neutrality tests given its very low inferred values(13 105) Null distributions used to test the significanceof the neutrality tests under these evolutionary scenarios weregenerated using coalescent simulations and a significancelevel of 005 (Hudson 2002) The KolmogorovndashSmirnoff testof neutrality adapted by McDonald (1998) was performedusing Slider software available at httpudeledu~mcdonaldaboutdnasliderhtml (last accessed July 16 2013) Weperformed coalescent simulations using ms software(Hudson 2002) Further methodological details are availableas supplementary material Supplementary Material onlineWe constructed the CYBA and NCF2 networks using allSNP variants and applying the Median joining algorithmand the maximum parsimony option calculations as imple-mented in the software Network 46 (Bandelt et al 1999)

Supplementary MaterialSupplementary tables S1ndashS5 and figure S1 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

SJC conceived the study ET-S MM FL RC AC CF LPand BS generated the dataanalyzed the sequences LB ACWCSM and MY provided tools for data generationand analyses ET-S MM FL WCSM and RR performedpopulation genetics analyses ET-S and MM prepared tablesand figures ET-S and SJC wrote the manuscript All the

authors contributed with discussions about the results Theauthors are grateful to Silvia Fuselli for discussions of ourresults to Fernando Levi Soares for technical assistance withthe figures and the Sequencing Group of the CoreGenotyping Facility (National Cancer Institute) for their tech-nical assistance This work was supported by the IntramuralResearch Program of the National Cancer Institute (NCI) andNational Institutes of Health (NIH) CF was supported by theUniversity of Bologna ET-S MM WCSM and FL weresupported by the Fogarty International Center and NationalCancer Institute (1R01TW007894) Conselho Nacional deDesenvolvimento Cientıfico e Tecnologico (Brazil Grant3075562012-3) Fundacao de Amparo a Pesquisa de MinasGerais (Brazil CBB PPM 0026512) and the Brazilian Ministryof Education (CAPES Agency grant PNPD 0256709-1)

ReferencesAyodo G Price AL Keinan A Ajwang A Otieno MF Orago AS

Patterson N Reich D 2007 Combining evidence of natural selectionwith association analysis increases power to detect malaria-resis-tance variants Am J Hum Genet 81(2)234ndash242

Bamshad M Wooding SP 2003 Signatures of natural selection in thehuman genome Nat Rev Genet 4(2)99ndash111

Bandelt H-J Forster P Rohl A 1999 Median-joining networks for infer-ring intraspecific phylogenies Mol Biol Evol 16(1)37ndash48

Barreiro LB Ben-Ali M Quach H et al (17 co-authors) 2009Evolutionary dynamics of human Toll-like receptors and their dif-ferent contributions to host defense PLoS Genet 5(7)e1000562

Barreiro LB Quintana-Murci L 2010 From evolutionary genetics tohuman immunology how selection shapes host defence genesNat Rev Genet 11(1)17ndash30

Barrett RD Schluter D 2008 Adaptation from standing genetic varia-tion Trends Ecol Evol 23(1)38ndash44

Bedard K Attar H Bonnefont J Jaquet V Borel C Plastre O Stasia MJAntonarakis SE Krause KH 2009 Three common polymorphisms inthe CYBA gene form a haplotype associated with decreased ROSgeneration Hum Mutat 30(7)1123ndash1133

Blekhman R Man O Herrmann L Boyko AR Indap A Kosiol CBustamante CD Teshima KM Przeworski M 2008 Natural selectionon genes that underlie human disease susceptibility Curr Biol18(12)883ndash889

Brandes RP Kreuzer J 2005 Vascular NADPH oxidases molecular mech-anisms of activation Cardiovasc Res 65(1)16ndash27

Buckley RH 2004 Pulmonary complications of primary immunodefi-ciencies Paediatr Respir Rev 5(Suppl A)S225ndashS233

Bustamante CD Fledel-Alon A Williamson S et al (13 co-authors)2005 Natural selection on protein-coding genes in the humangenome Nature 437(7062)1153ndash1157

Bustamante J Arias AA Vogt G et al (29 co-authors) 2011 GermlineCYBB mutations that selectively affect macrophages in kindredswith X-linked predisposition to tuberculous mycobacterial diseaseNat Immunol 12(3)213ndash221

Campbell MC Tishkoff SA 2008 African genetic diversity implicationsfor human demographic history modern human origins and com-plex disease mapping Annu Rev Genomics Hum Genet 9403ndash433

Chanock SJ Roesler J Zhan S Hopkins P Lee P Barrett DT ChristensenBL Curnutte JT Gorlach A 2000 Genomic structure of the humanp47-phox (NCF1) gene Blood Cells Mol Dis 26(1)37ndash46

Cunninghame Graham DS Morris DL Bhangale TR Criswell LASyvanen AC Ronnblom L Behrens TW Graham RR Vyse TJ2011 Association of NCF2 IKZF1 IRF8 IFIH1 and TYK2 with sys-temic lupus erythematosus PLoS Genet 7(10)e1002341

Excoffier L Laval G Schneider S 2007 Arlequin ver 30 an integratedsoftware package for population genetics data analysis EvolBioinform Online 147ndash50

2166

Tarazona-Santos et al doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

Excoffier L Ray N 2008 Surfing during population expansions promotesgenetic revolutions and structuration Trends Ecol Evol23(7)347ndash351

Ferrer-Admetlla A Bosch E Sikora M Marques-Bonet T Ramırez-SorianoA Muntasell A Navarro A Lazarus R Calafell F Bertranpetit J Casals F2008 Balancing selection is the main force shaping the evolution ofinnate immunity genes J Immunol 181(2)1315ndash1322

Fu YX Li WH 1993 Statistical tests of neutrality of mutations Genetics133(3)693ndash709

The 1000 Genomes Project Consortium 2010 A map of humangenome variation from population-scale sequencing Nature467(7319)1061ndash1073

The 1000 Genomes Project Consortium Abecasis GR Auton A BrooksLD DePristo MA Durbin RM Handsaker RE Kang HM Marth GTMcVean GA 2012 An integrated map of genetic variation from1092 human genomes Nature 491(7422)56ndash65

Guo SW Thompson EA 1992 Performing the exact test of Hardy-Weinberg proportion for multiple alleles Biometrics 48(2)361ndash372

Heyworth PG Cross AR Curnutte JT 2003 Chronic granulomatousdisease Curr Opin Immunol 15(5)578ndash584

Hudson RR 2002 Generating samples under a Wright-Fisher neutralmodel of genetic variation Bioinformatics 18(2)337ndash338

Jacob CO Eisenstein M Dinauer MC et al (21 co-authors) 2012 Lupus-associated causal mutation in neutrophil cytosolic factor 2 (NCF2)brings unique insights to the structure and function of NADPHoxidase Proc Natl Acad Sci U S A 109(2)E59ndashE67

Kimura M 1974 The neutral theory of molecular evolution CambridgeCambridge University Press

Kosiol C Vinar T da Fonseca RR Hubisz MJ Bustamante CD Nielsen RSiepel A 2008 Patterns of positive selection in six mammalian ge-nomes PLoS Genet 4(8)e1000144

Lacy F Kailasam MT OrsquoConnor DT Schmid-Schonbein GW Parmer RJ2000 Plasma hydrogen peroxide production in human essentialhypertension role of heredity gender and ethnicity Hypertension36(5)878ndash884

Laval G Patin E Barreiro LB Quintana-Murci L 2010 Formulating a his-torical and demographic model of recent human evolution based onresequencing data from noncoding regions PLoS One 5(4)e10284

Lewontin R 1972 The apportionment of human diversity Evol Biol 6391ndash398

Lindblad-Toh K Garber M Zuk O et al (88 co-authors) 2011 A high-resolution map of human evolutionary constraint using 29 mam-mals Nature 478(7370)476ndash482

Lodge R Diallo TO Descoteaux A 2006 Leishmania donovani lipopho-sphoglycan blocks NADPH oxidase assembly at the phagosomemembrane Cell Microbiol 8(12)1922ndash1931

Magalhaes WC Rodrigues MR Silva D Soares-Souza G Iannini MLCerqueira GC Faria-Campos AC Tarazona-Santos E 2012DIVERGENOME a bioinformatics platform to assist population ge-netics and genetic epidemiology studies Genet Epidemiol36(4)360ndash367

Marth JD Grewal PK 2008 Mammalian glycosylation in immunity NatRev Immunol 8(11)874ndash887

McDonald JH 1998 Improved tests for heterogeneity across a region ofDNA sequence in the ratio of polymorphism to divergence Mol BiolEvol 15377ndash384

McDonald JH Kreitman M 1991 Adaptive protein evolution at the Adhlocus in Drosophila Nature 351(6328)652ndash654

Nielsen R Bustamante C Clark AG et al (12 co-authors) 2005 A scanfor positively selected genes in the genomes of humans and chim-panzees PLoS Biol 3(6)e170

Olsson LM Lindqvist AK Kallberg H Padyukov L Burkhardt HAlfredsson L Klareskog L Holmdahl R 2007 A case-control studyof rheumatoid arthritis identifies an associated single nucleotidepolymorphism in the NCF4 gene supporting a role for the

NADPH-oxidase complex in autoimmunity Arthritis Res Ther9(5)R98

Packer BR Yeager M Burdett L et al (13 co-authors) 2006SNP500Cancer a public resource for sequence validation assay de-velopment and frequency analysis for genetic variation in candidategenes Nucleic Acids Res 34(Database issue)D617ndashD621

Peter BM Huerta-Sanchez E Nielsen R 2012 Distinguishing betweenselective sweeps from standing variation and from a de novo mu-tation PLoS Genet 8(10)e1003011

Przeworski M Coop G Wall JD 2005 The signature of positive selectionon standing genetic variation Evolution 59(11)2312ndash2323

Ptak SE Przeworski M 2002 Evidence for population growth in humansis confounded by fine-scale population structure Trends Genet18(11)559ndash563

Ramensky V Bork P Sunyaev S 2002 Human non-synonymous SNPsserver and survey Nucleic Acids Res 30(17)3894ndash3900

Rioux JD Xavier RJ Taylor KD et al (24 co-authors) 2007 Genome-wideassociation study identifies new susceptibility loci for Crohn diseaseand implicates autophagy in disease pathogenesis Nat Genet39(5)596ndash604

Roberts RL Hollis-Moffatt JE Gearry RB Kennedy MA Barclay MLMerriman TR 2008 Confirmation of association of IRGM andNCF4 with ileal Crohnrsquos disease in a population-based cohortGenes Immun 9(6)561ndash565

Rozas J 2009 DNA sequence polymorphism analysis using DnaSPMethods Mol Biol 537337ndash350

San Jose G Fortuno A Beloqui O Dıez J Zalba G 2008 NADPH oxidaseCYBA polymorphisms oxidative stress and cardiovascular diseasesClin Sci (Lond) 114(3)173ndash182

Santiago HC Gonzalez Lombana CZ Macedo JP et al (12 co-authors)2012 NADPH phagocyte oxidase knockout mice controlTrypanosoma cruzi proliferation but develop circulatory collapseand succumb to infection PLoS Negl Trop Dis 6(2)e1492

Stephens M Scheet P 2005 Accounting for decay of linkage disequilib-rium in haplotype inference and missing-data imputation Am JHum Genet 76(3)449ndash462

Sumimoto H Miyano K Takeya R 2005 Molecular composition andregulation of the Nox family NAD(P)H oxidases Biochem BiophysRes Commun 338(1)677ndash686

Tajima F 1983 Evolutionary relationship of DNA sequences in finitepopulations Genetics 105(2)437ndash460

Tajima F 1989 Statistical method for testing the neutral mutationhypothesis by DNA polymorphism Genetics 123(3)585ndash595

Tarazona-Santos E Bernig T Burdett L Magalhaes WC Fabbri C Liao JRedondo RA Welch R Yeager M Chanock SJ 2008 CYBB anNADPH-oxidase gene restricted diversity in humans and evidencefor differential long-term purifying selection on transmembrane andcytosolic domains Hum Mutat 29(5)623ndash632

Taylor RM Burritt JB Baniulis D Foubert TR Lord CI Dinauer MCParkos CA Jesaitis AJ 2004 Site-specific inhibitors of NADPH oxi-dase activity and structural probes of flavocytochrome b character-ization of six monoclonal antibodies to the p22phox subunitJ Immunol 173(12)7349ndash7357

Voight BF Kudaravalli S Wen X Pritchard JK 2006 A map of recentpositive selection in the human genome PLoS Biol 4(3)e72

Watterson GA 1975 On the number of segregating sites in geneticalmodels without recombination Theor Popul Biol 7(2) 256ndash276

Williamson SH Hernandez R Fledel-Alon A Zhu L Nielsen RBustamante CD 2005 Simultaneous inference of selection and pop-ulation growth from patterns of variation in the human genomeProc Natl Acad Sci U S A 102(22)7882ndash7887

Yang Z 2007a Adaptive molecular evolution In Balding DJ Bishop MCannings C editors Handbook of statistical genetics Vol 1 3rd edSusex (United Kingdom) John Wiley amp Sons p 377ndash406

Yang Z 2007b PAML 4 phylogenetic analysis by maximum likelihoodMol Biol Evol 241586ndash1591

2167

Evolutionary Dynamics of Human NADPH Oxidase Genes doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

et al 2011) as well as for immune related diseases such asCrohnrsquos disease and lupus as identified in genome-wide as-sociation studies (GWAS) in European populations (Riouxet al 2007 Roberts et al 2008 Jacob et al 2012) Besidesthe phagocyte NADPH oxidase other NADPH oxidaseswith different functions are expressed in a variety ofnonphagocytic cells including the endothelium and havebeen implicated in cardiovascular and renal diseaseAlthough p22-phox (encoded by CYBA) is a protein compo-nent shared by several of these NADPH oxidases (also calledNox) other more specific protein subunits are encoded bydifferent Nox genes homologous to the genes coding for thephagocytic subunits (Sumimoto et al 2005 San Jose et al2008) Although these nonphagocytic NADPH oxidases nor-mally produce less O2 even small imbalances in ROS levelsmay cause tissue damage due to oxidative stress which iscorrelated with the pathogenesis of gout chronic obstructivepulmonary disease rheumatoid arthritis and cardiovasculardiseases (Brandes and Kreuzer 2005) Therefore variants inNADPH oxidase genes may have pleiotropic effects across aspectrum of disorders (Santiago et al 2012)

Despite the involvement of the NADPH oxidase in a rangeof clinically relevant phenotypes our knowledge of the se-quence diversity of NADPH genes mostly derives from CGDpatients Although targeted SNP genotyping has been per-formed in the context of association studies for CYBA(Bedard et al 2009) and NCF4 (Olsson et al 2007) none ofthe large-scale resequencing efforts such as Seattle SNPs(httppgagswashingtonedu last accessed July 16 2013)Innate Immunity PGA (httpwwwpharmgatorgIIPGA2index_html last accessed July 16 2013) and the CornellndashCelera initiative (Bustamante et al 2005) have included theNADPH oxidase genes and the coverage of these genes for thecurrent release of the 1000 Genomes Project remains low formost of the studied individuals (1000 Genomes ProjectConsortium et al 2012 average coverage and their standard

deviations on May 2013 are CYBB 40 plusmn 22 CYBA 35 plusmn 20NCF2 49 plusmn 27 and NCF4 49 plusmn 27) Although GWAS haveidentified common variants that contribute to complex phe-notypes a component of missing heritability of common dis-eases due to rare variants that are detectable only byresequencing is emerging In this study we analyzed the pat-tern of sequence diversity of four of the NADPH genes (CYBBCYBA NCF2 and NCF4) between mammalian species and inhuman populations by resequencing these genes in 102 eth-nically diverse individuals We interpreted our results in termsof evolutionary histories by addressing the action of naturalselection and focusing on two temporal scales mammalianevolution and recent human evolution We excluded NCF1from our study because its high homology with its pseudo-genes prevents reliable sequencing in individual samples(Chanock et al 2000) Several studies have shown the impor-tance of natural selection on the evolution of immunity genesat both the interspecific (Kosiol et al 2008) and populationlevels (Ferrer-Admetlla et al 2008 Barreiro et al 2009 Barreiroand Quintana-Murci 2010) By definition variants under nat-ural selection are associated with different reproductive effi-ciencies (fitness) of their carriers and contribute to phenotypevariability therefore they may be biomedically relevant byinfluencing the susceptibility to rare or common diseasesThe goals of this study are as follows 1) to determine whetherthe pattern of diversity of human phagocyte NADPH genesreflects the action of different types of natural selection 2) toelucidate the evolutionary dynamics of NADPH genes at thetemporal scales of mammals and humans and 3) to under-stand the biomedical implications of this evolutionary processin human populations

Results

Molecular Evolution of NADPH Genes alongMammalian Phylogeny

We examined signatures of natural selection across thecoding regions of NADPH genes by analyzing sequencesfrom the complete genomes of 29 mammals listed in theEntrez and Ensembl databases (Lindblad-Toh et al 2011one sequence for each species see supplementary materialSupplementary Material online for details) and comparingthe amount of nonsynonymous and synonymous substitu-tions (Nielsen et al 2005) When comparing a set of homol-ogous sequences from different species most of the observeddifferences are fixed that is the differences are monomorphicwithin a species because enough time has passed for theobserved variant to appear increase its frequency and reacha frequency of 1 (Kimura 1974) We compared the number offixed synonymous substitutions (dS assumed to be neutral)and fixed nonsynonymous substitutions (dN for which wetest the hypothesis of natural selection) between speciesusing the parameter = dNdS which is informative of theaction of natural selection at the inter-specific level (Yang2007a) Under neutral evolution of nonsynonymous substitu-tions these substitutions fix at the same rate as synonymoussubstitutions and therefore dN amp dS and amp 1 If nonsyn-onymous substitutions tend to be deleterious purifying

FIG 1 Components of the phagocyte NADPH oxidase Representationof the inactivated (left) and activated (right) forms of the phagocyteNADPH oxidase components reproduced from Heyworth et al (2003)The activated form is responsible for the respiratory burst The proteins(and genes) are gp91 (CYBB Xp211) p22 (CYBA 16q24) p67 (NCF21q25) p40 (NCF4 22q131) and p47 (NCF1 7q1123)

2158

Tarazona-Santos et al doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

selection maintains the substitutions at low frequencies andprevents fixation at the same rate as synonymous substitu-tions resulting in dNlt dS and lt 1 On the other hand ifepisodes of positive natural selection (that raise the frequencyof beneficial variants) are frequent nonsynonymous substi-tutions increase in frequency and fix more rapidly than neu-tral synonymous substitutions thus dNgt dS and gt 1 Weused the maximum likelihood framework developed by Yang(2007a) to estimate for the NADPH oxidase genes under avariety of evolutionary models as implemented in the PAMLsoftware (Yang 2007b) This approach allows inferences aboutthe evolution of a coding region along an interspecific phy-logeny and maps the codons that have evolved under strongweak purifying selection neutrality or adaptive positive se-lection (see supplementary material Supplementary Materialonline for details)

In general PAML evolutionary models that allow a combi-nation of purifying selection and neutrality are reasonably re-alistic These models are nested with respect to models thatalso incorporate positive selection at the cost of adding newparameters We evaluated the improvements in the goodnessof fit of the data using the latter model with respect to theformer models by applying a likelihood ratio test (LRT) Afterfitting the data to the most appropriate evolutionary modelNaive Empirical Bayes (NEB) or Bayes Empirical Bayes (BEB)approaches were used to infer theparameter for each codon

For the 29 species of mammals considered we filteredbased on quality control (supplementary table S1 Supple-mentary Material online) and analyzed 570 codons of CYBBin 26 species 198 codons of CYBA in 16 species 526 codons ofNCF2 in 23 species and 339 codons of NCF4 in 20 species (seesupplementary material Supplementary Material online forfurther details including the species and sequences used forthe analyses the parameter estimations for the differentmodels and the LRT results) Here we only present the resultsof model M3 of Yang (2007a) with three (K = 3) classes of (fig 2) This model allows for different classes including thepossibility of positive selection and is a reasonable way ofpresenting the results for the four genes under the samemodel Moreover in all cases the data fit better withmodel M3 than the nested and simpler M0 or M2 modelspresented by Yang (2007a)

The results of this analysis for CYBB CYBA NCF2 and NCF4are presented in figure 2 which shows the type of naturalselection (ie based on the estimated ) for each codon thatmost likely predominated during mammalian evolution Forthis temporal scale CYBA NCF2 and NCF4 coding regionshave evolved driven by a combination of different levels ofpurifying natural selection Overall the average and standarddeviation values for these genes are NCF2 = 0256 plusmn 0227NCF4 = 0126 plusmn 0140 and CYBA = 0109 plusmn 0116

Our most striking result is for CYBB which presents a widespectrum of mutations that account for gt70 of CGD pa-tients Although we would predict that purifying selection ongenes involved in Mendelian diseases (Blekhman et al 2008)would yield similar results for CYBB and other NADPHcomponents we observed a different pattern In generalCYBB is a conserved gene but 6 of its codons show

evidence of positive natural selection (supplementary tableS2 Supplementary Material online fig 2) In a genome-widesurvey performed by Kosiol et al (2008) they have reportedCYBB as a gene showing a signal of positive natural selectionMore importantly by evolutionary mapping we show herefor the first time that most of these positive selection eventsmap to the small extracellular portion of this protein (fig 3)The proximity of these inferred episodes of positive naturalselection to glycosylation sites in gp91 is noteworthy consid-ering the importance of the glycome in immunity (Marth andGrewal 2008)

Population Genetics of NADPH Genes

We sequenced CYBB CYBA NCF2 and NCF4 in a publiclyavailable panel that includes 24 individuals of African ances-try 31 Europeans 24 AsianOceanians and 23 admixed LatinAmericans (ie Hispanics) This panel is a suboptimal repre-sentation of the worldwide population a limitation that iscommon to most human genomic diversity projects focusedon SNP genotyping or resequencing efforts However basedon how human genetic diversity is apportioned within(gt85) and between (lt15) populations (Lewontin 19721000 Genomes Project Consortium 2010) even studies usingsuboptimal sampling are informative about the genetic struc-ture of human populations and serve to critically identify therole of evolutionary factors in human genetic diversity(Kimura 1974 Nielsen et al 2005 1000 Genomes ProjectConsortium 2010) All the raw results are available as supple-mentary material Supplementary Material online and at theSNP500Cancer project homepage (httpvariantgpsncinihgovcgfseqpagessnp500do last accessed July 16 2013) orcan be downloaded from the DIVERGENOME platform(Magalhaes et al 2012 httpwwwpggeneticaicbufmgbrdivergenome last accessed July 16 2013)

To ascertain which combination of evolutionary factorshas shaped the diversity of NADPH genes we assessed thepattern of nonsynonymous and synonymous polymor-phisms as well as intra- and interpopulation diversity forNADPH genes and tested the null hypothesis of neutralitythat patterns of diversity may be explained by consideringonly the demographic history of human populations and themutation and recombination rates of each locus

Nonsynonymous polymorphisms are underrepresentedin the human genome and usually occur at low frequencieswhen present reflecting the action of purifying natural selec-tion (1000 Genomes Project Consortium 2010) By resequen-cing Tarazona-Santos et al (2008) did not observe commonnonsynonymous polymorphisms for CYBB This result is con-sistent with purifying natural selection acting on X-chromo-some genes due to the exposure of deleterious recessivemutations to natural selection in hemizygous males Thussubstitutions in the coding region of CYBB should be rarein human populations and are seldom captured by studieswith small sample sizes Interestingly the lack of CYBBnonsynonymous polymorphisms in our sample of humanpopulations contrasts with the recurrent episodes of positiveselection of the extracellular portion of gp91 during

2159

Evolutionary Dynamics of Human NADPH Oxidase Genes doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

mammalian evolution as inferred in this study For the auto-somal NADPH oxidase components (table 1 and haplotypetables online Supplementary Material online) we observe inthis study two rare and conservative nonsynonymous substi-tutions (ie involving amino acids with similar chemical prop-erties T85N and A304E) in NCF4 But for NCF2 and CYBA weobserved patterns of nonsynonymous substitutions thatseldom occur in human genes NCF2 has nine nonsynony-mous substitutions three of them are common (with a fre-quency higher than 5 in at least one of the studiedpopulation samples) and six are rare On the other handtwo nonsynonymous substitutions in CYBA are commonand ubiquitous in human populations namely Y72H(rs4673) and V174A (rs1049254 in a position where variationamong mammalian species is also observed) Moreover thefollowing two of the five common amino acid changes ob-served in the autosomal NADPH genes are predicted to bepossibly damaging (ie radical) by the Polyphen resource(Ramensky et al 2002) the two changes are R395W inHispanic NCF2 (rs13306575) and Y72H (rs4673) in CYBA Ingeneral Polyphen accurately predicts the effect of nonsynon-ymous substitutions based on biochemical and evolutionarydata (Williamson et al 2005) Notably the ImmunodeficiencyMutations Database (httpbioinfutafiNCF2basecontent=pubIDbases last accessed July 16 2013) reports one395W395W autosomal recessive CGD patient but we andthe HapMap project (wwwhapmaporg last accessed July 16

2013) observed the W allele at frequencies between 5 and10 in Asians and admixed Latin Americans including onesupposedly healthy 395W395W Japanese HapMap individ-ual On the basis of these results we verified whether NativeAmericans who descend from an ancestral Pleistocene Asianpopulations that peopled the Americas by the Behring Straitsmore than 14000 years ago may have relatively higher fre-quencies of this variant We genotyped the variant 395Wusing a Taqman assay in 558 Native Americans (see supple-mentary table S3 Supplementary Material online for detailedresults) and observed an allele frequency of 12 being thisvariant always present in heterozygous individuals

For the four studied genes the diversity and levels of re-combination are higher in Africans than in non-Africans(table 2 and haplotype tables available as supplementaryfiles Supplementary Material online) This result is a conse-quence of the African origin of modern humans and the ldquooutof Africardquo migration that occurred 40000ndash80000 years agoafter a bottleneck leading to the peopling of other continents(Campbell and Tishkoff 2008 Laval et al 2010) Therefore thefirst divergence between continental human populations wasbetween Africans and ancestral non-Africans Consistentlywith this scenario we observed the highest between-popula-tion differentiation for CYBB CYBA and NCF4 between thesetwo groups Interestingly NCF2 does not match this pattern(table 3)

FIG 2 Inferred types of natural selection for codons of the NADPH genes at the evolutionary time scale of mammals Codons are represented along thehorizontal axis For each gene three classes of sites (black dark gray and light gray) are considered and each class evolved under different inferred values (presented for each gene in the figure at the top of each graphic) These classes correspond to the model M3 of Yang (2007a) with three classes ofsites Given our data this model is more likely than alternative models of evolution that assume simpler scenarios such as a unique for the entire gene(see supplementary material Supplementary Material online for details regarding methods and results using alternative models) The three classescorrespond to different types and levels of natural selection from strong purifying selection (in the lightest gray) to positive selection (gt 1) For eachcodon the probability of belonging to each of the three classes of corresponds to the height of the corresponding color in the vertical bar Forexample codon 173 of CYBB (indicated by a white arrow) has a 0000 probability of belonging to the= 00095 class (light gray a class corresponding tostrong purifying selection) a 0169 probability of belonging to the = 03085 class (dark gray) and a 0831 probability of belonging to the = 18987class (black a class that suggests positive selection) In this case reasonable evidence of positive selection on this codon exists

2160

Tarazona-Santos et al doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

FIG

3

Nat

ural

sele

ctio

nm

app

ing

acro

ssC

YBB

(en

codi

ng

gp91

)al

ong

mam

mal

ian

evol

utio

na

sid

enti

fied

usin

gth

ePA

ML

met

hod

byY

ang

(200

7a)

The

top

olog

ies

ofgp

91an

dp2

2ar

ere

pro

duce

dfr

omT

aylo

ret

al(

2004

Cop

yrig

ht20

04T

heA

mer

ican

Ass

ocia

tion

ofIm

mun

olog

ists

In

cU

sed

wit

hp

erm

issi

on)

Dar

kgr

ayam

ino

acid

sha

veev

olve

dun

der

pos

itiv

ese

lect

ion

wit

hgt

80

pro

babi

lity

Mos

tof

thes

eam

ino

acid

sar

ein

the

extr

acel

lula

rp

orti

onof

the

pro

tein

The

upp

erp

art

ofth

efig

ure

show

sth

ep

rote

inal

ign

men

tfo

rn

ine

mam

mal

sof

the

gp91

regi

onin

dica

ted

byth

ebl

ack

ellip

seI

nth

isre

gion

ahi

ghle

velo

fam

ino

acid

varia

tion

isfo

und

betw

een

spec

ies

and

seve

ralc

odon

ssh

owgt

1In

this

alig

nm

ent

gray

vert

ical

bars

corr

esp

ond

tova

riab

leam

ino

acid

site

sT

hep

rote

inal

ign

men

tof

mam

mal

ssh

ows

the

follo

win

gsp

ecie

sH

om(H

omo

sapi

ens

sapi

ens)

Pan

(Pan

trog

lody

tes)

Mac

(Mac

aca

mul

atta

)M

us(M

usm

uscu

lus)

Rat

(Rat

tus

norv

egic

us)

Cri

(Cri

cetu

lus

gris

eus)

Het

(Het

eroc

epha

lus

glab

er)

Cav

(Cav

iapo

rcel

lus)

an

dO

ry(O

ryct

olag

uscu

nicu

lus)

EC

ext

race

lula

ren

viro

nm

ent

TM

tra

nsm

embr

ane

laye

ran

dIC

in

trac

ellu

lar

envi

ron

men

t

2161

Evolutionary Dynamics of Human NADPH Oxidase Genes doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

From the four studied genes NCF4 which encodes theregulatory protein p40-phox shows a pattern of diversitythat is typical for a gene that has evolved under neutralityIn addition to the features described in the previous para-graph the allelic spectra of NCF4 in the studied populationsare consistent with a neutral model of evolution (tables 2ndash4)

Although CYBB presented the most interesting evolution-ary history at the interspecific level with repeated episodes ofpositive natural selection recent human evolutionary historyhas resulted in interesting patterns of variation for CYBA andNCF2 CYBA encodes p22-phox which is a transmembraneprotein shared by different NADPH oxidases In addition toharboring two common nonsynonymous polymorphisms(V174A is also variable among different species of mammals)CYBA is the most variable and most affected by recombina-tion among the NADPH oxidase genes (tables 1 and 2) Inparticular CYBA diversity is very high in Europe comparedwith 329 genes resequenced in a European sample (httppgagswashingtonedusummary_statshtml last accessed July16 2013) CYBA ranks 11th (ie the 97th percentile)Moreover there are contrasting proportions of the totalnumber of polymorphismsnumber of singletons betweenAfricans (a low proportion) and Europeans (a high propor-tion) the latter showing an excess of common polymor-phisms with respect to the neutral expectation (see the DFL

test in table 4 and supplementary table S4 SupplementaryMaterial online) This excess of common variants inEuropeans is also significant when we conservatively testedit against a scenario of human evolution that incorporates theldquoOut of Africardquo bottleneck (Laval et al 2010) and the observedlevel of recombination in Europeans (CYBA = 807 for the se-quenced region) Because demographic forces and recombi-nation levels alone do not explain the high CYBA diversity andits excess of common polymorphisms we suggest that bal-ancing natural selection (that acts by maintaining differentalleles at high frequency in a population) has contributed to

shape the diversity of CYBA at least in the European popu-lation Indeed figure 4a shows that the haplotype network ofCYBA for the European population is consistent with theaction of balancing natural selection (Bamshad andWooding 2003) showing two well-differentiated commonclades that explain the observed high diversity and theexcess of common CYBA variants A comparative genomicanalysis confirms this inference the ratio of polymorphismsto differences fixed between human and chimpanzee is nothomogeneous along the gene in the different human popu-lations (Mc Donald 1998 supplementary table S5Supplementary Material online) as would be expectedunder neutral evolution Our inference of balancing naturalselection is consistent with the fact that 25ndash30 of the var-iation in levels of ROS production can be attributed to geneticfactors (Lacy et al 2000) and that ROS levels are associatedwith CYBA variants (Bedard et al 2009)

p67 encoded by NCF2 is a necessary cytosolic NADPHcomponent for phagocyte ROS production Asians show ahighly differentiated NCF2 haplotype structure (see frequen-cies of haplotypes NCF2-D11 and NCF2-E10 in the haplotypetables online and in the network shown in fig 4b) and thehighest FST values are observed in pairwise comparisons be-tween Asians and non-Asian populations (in particular withEuropeans table 3) and not between Africans and non-African populations as is usually observed in the humangenome We confirmed these results by analyzing data forNCF2 from the HapMap Project (supplementary materialSupplementary Material online) Moreover a trend towardan excess of rare polymorphisms exists in Asians that is notobserved elsewhere (tables 1 2 and 4 DFL =1904FFL =1893) Although we cannot exclude that this patternof diversity is compatible with the null hypotheses of neutral-ity and with the tested demographic history of human pop-ulations inferred by Laval et al (2010 tables 2ndash4) we canspeculate and envisage four additional evolutionary scenarios

Table 1 Allele Frequencies of Nonsynonymous Polymorphisms in NADPH Oxidase Genes

Genes rs Minor Allele(Amino Acid)

PolyphenPrediction

African European Asian Hispanic

CYBA

Y72H rs4673 T (Y) Possibly damaging 046 032 017 022

V174A rs1049254 T (V) Benign 017 048 048 018

NCF2

K181R rs2274064 G (R) Benign 035 037 041 048

T279M rs13306581 T (T) Probably damaging 000 000 005 000

V297A rs35937854 C (A) Benign 004 000 000 000

T361S Chr1181799289 NCBI36hg18 T (S) mdash 000 000 002 000

H389Q rs17849502 A (Q) Benign 000 005 000 007

R395W rs13306575 T (W) Possibly damaging 000 000 000 007

N419I rs35012521 T (I) Probably damaging 000 002 004 000

P454S rs55761650 T (S) mdash 000 000 000 002

L487S Chr1181795862 NCBI36hg18 C (S) mdash 002 000 000 000

NCF4

T85N rs112306225 A (N) mdash 000 000 002 000

A304E rs5995361 A (E) Benign 004 000 000 000

2162

Tarazona-Santos et al doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

that may have contributed to shape the pattern of NCF2diversity in Asians 1) the observed trend is suggestive of aselective sweep on NCF2 standing variation a neutral orweakly deleterious existing variant becomes beneficial andrapidly increases in frequency (together with its associatedhaplotypes ie incomplete sweep) reducing the nucleotidediversity in the surrounding region and rendering other stand-ing substitutions rare During this process new rare substitu-tions appear in the expanding positively selected haplotype2) The pattern of diversity of NCF2 in Asia may result from anincomplete selective sweep acting on ARPC5 which is locatedapproximately 35 kb downstream of NCF2 In a genome-widescan for recent positive selection Voight et al (2006) identi-fied a strong signature of an incomplete sweep for ARPC5 inAsia (P = 0009) characterized by a higher than expected long-range linkage disequilibrium summarized by very high iHS

statistics (see Haplotter results for the HapMap II data athttphaplotteruchicagoedu last accessed July 16 2013)SNPs in NCF2 also presents high iHS statistics (P = 002) al-though values are lower than for ARPC5 3) The differentiatedpattern of diversity of NCF2 in Asia may also have been gen-erated without the action of natural selection during the firstcolonization of Asia by modern humans In a process of geo-graphic population expansion specifically in the front wave ofthe expansion some rare alleleshaplotypes (ie surfing al-leles) may become common by chance mimicking the pat-tern of diversity generated by a selective sweep (Excoffier andRay 2008) 4) The excess of rare variants may be an artifact ofpooling individuals from different populations (Ptak andPrzeworski 2002) Consistent with evolutionary scenarios 1ndash4 that produce similar patterns of diversity the haplotypenetwork of NCF2 for Eurasians (fig 4b) shows the following1) a large differentiation between Asians and Europeans thatis compatible with the high observed FST values and 2) a star-like shape associated with the haplotype NCF2-E10 that iscommon in Asia and rare elsewhere which is compatiblewith the excess of rare alleles in the Asian populations

DiscussionBy analyzing 29 mammalian genomes and four human pop-ulations we show in this study that natural selection hasacted in different ways over time to shape the pattern ofdiversity of the phagocyte NADPH oxidase genes At the tem-poral scale of the evolution of mammals we have inferredrecurrent episodes of positive selection acting on the extra-cellular portion of gp-91 that have been important to shapethe pattern of interspecific diversity of this gene Our in-terspecific analyses did not show a similar pattern of naturalselection in any of the other phagocyte NADPH oxidasegenes Even if current knowledge on the biology of NADPHdoes not allow us to interpret our results in terms of functionwe propose that the extracellular region of gp-91 is function-ally relevant Our results also imply that this region is highlydifferentiated among mammals at the protein level and thisvariability should be considered when mammals models areused to study the structure and function of phagocyteNADPH components

In the time scale of human evolution our analyses of theNADPH oxidase genes suggest that CYBA has been a target ofbalancing natural selection Because we do not have evidenceof population-specific variants that faced selective pressurethe inferred natural selection may have acted on a standingvariation in ancestral populations This implies that the selec-tive pressure began after the appearance of the variant andpossibly acted in a specific geographic region (Barret andSchluter 2008) The signatures of natural selection acting ona new mutation and on standing variation differ In the case ofselective sweeps episodes of natural selection on standingvariation are associated to a larger variance in the allelic spec-trum with respect to natural selection on a new mutationAlso selection on standing variation may produce an excess ofalleles at intermediate frequencies that is not associated withhigh nucleotide diversity (Przeworski et al 2005 Peter et al2012) This pattern contrasts with the effect of balancing

Table 2 Intrapopulation Diversity Indexes in the Studied Populationsfor the NADPH Oxidase Genes Obtained from Resequencing Dataa

African European Asian Hispanic

Number of chromosomes

CYBBb 42 52 34 36

CYBA 48 62 48 46

NCF2 48 62 48 46

NCF4 48 62 48 46

Segregating sitessingletons

CYBB 218 70 103 135

CYBA 6122 333 335 347

NCF2 4613 3311 2816 3714

NCF4 4512 267 191 308

Haplotype structureNumber of inferred haplotypesc

CYBB 14 5 7 12

CYBA 39 39 32 26

NCF2 38 32 18 31

NCF4 36 30 17 22

Haplotype diversity plusmn SD

CYBB 088 plusmn 003 034 plusmn 008 053 plusmn 010 070 plusmn 008

CYBA 098 plusmn 001 098 plusmn 001 097 plusmn 000 096 plusmn 002

NCF2 099 plusmn 001 096 plusmn 001 087 plusmn 004 097 plusmn 001

NCF4 098 plusmn 001 093 plusmn 002 093 plusmn 002 093 plusmn 002

Recombination parameter (q 103 per site)c

CYBB 008 lt001 lt001 lt002

CYBA 291 148 096 088

NCF2 052 038 010 058

NCF4 137 103 039 022

h estimatorsd

p 103 per site

CYBB 036 012 015 028

CYBA 190 163 151 157

NCF2 081 047 043 057

NCF4 101 076 064 084

hW 103 per site

CYBB 042 013 021 027

CYBA 231 118 125 13

NCF2 102 069 062 083

NCF4 101 070 054 087

aMost analyses were performed using software DnaSP (Rozas 2009)bData for CYBB (Xp211) are from Tarazona-Santos et al (2008)cHaplotypes and inferred using the method by Stephens and Scheet (2005) andthe software PHASEd Tajima (1983) W Watterson (1975)

2163

Evolutionary Dynamics of Human NADPH Oxidase Genes doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

natural selection which produces an excess of common allelesassociated with high genetic diversity Thus the observed pat-tern of CYBA diversity in Europeans is not consistent with aselective sweep on a standing variation but it is consistent witha scenario of balancing selection acting on standing variation

If we consider for CYBA that heterozygote advantage maybe the mechanisms of balancing selection we can speculatethat the biological basis for this mechanism may be the fol-lowing considering that p22-phox is not exclusive of thephagocyte NADPH oxidase but it is also part of Nox com-plexes expressed in other tissues the dependence of ROSproduction on CYBA variants has to be finely regulated IfCYBA variants induce high levels of ROS these variants mayfavor a phagocyte-dependent efficient response to pathogensbut may damage other endothelial tissues Alternativelytissue oxidative damage does not occur if ROS productionis low but this response may be associated with a weaker

phagocyte respiratory burst against pathogens In this con-text heterozygote individuals with a CYBA-dependent inter-mediate level of ROS production may have been favored bynatural selection

Our results contribute to the discussion regarding the rel-evance of balancing selection in shaping the diversity ofinnate immunity genes (Ferrer-Admetlla et al 2008 Barreiroet al 2009) Ferrer-Admetlla et al (2008) have associated therecurrent signatures of balancing selection on inflammatorygenes with the need for fine regulation CYBA is the onlyphagocytic NADPH oxidase gene that also encodesnonphagocyte Nox components thus CYBA has a role inROS cell signaling a potentially dangerous process due toits capability to produce oxidative damage to tissues Geneswith these characteristics likely need even tighter regulationThe interplay between pathogen-driven selective pressureon innate immunity genes and their concomitantnonimmunological functions is complex In addition toCYBA other interesting examples of this interplay can befound among the 10 human toll-like receptors (TLRs) thatshow a variety of signatures of natural selection TLR8 whichshows the strongest signature of purifying selection is alsoinvolved in neuronal development (Barreiro et al 2009) and itis difficult to discriminate the role of each function in deter-mining the observed signature of natural selection

With few exceptions the pathogens responsible for naturalselection on immune genes are difficult to specify In the caseof NADPH oxidase we can infer based on the spectrum ofinfections in CGD patients that catalase-positive bacteria andfungi such as Staphylococci Salmonella Candida Aspergillusand M tuberculosis may be the selective pathogensInteractions between the host and pathogens also includemechanisms of the latter to impair the respiratory burst ofthe former For example Leishmania donovani blocks the as-sembly of NADPH oxidase at the phagosome membrane(Lodge et al 2006) These mechanisms may constitute selec-tive pressures imposed by pathogens

The associations reported in GWAS between rs4821544 inNCF4 and Crohnrsquos disease (an idiopathic inflammatory boweldisease that predominantly involves the ileum and colonRioux et al 2007) and between rs10911363 in NCF2 and

Table 3 Pairwise FST Genetic Distances between Populations

CYBBa CYBA

Africa Europe Asia Hispanic Africa Europe Asia Hispanic

Africa mdash 0316 0264 0092 mdash 0074 0083 0065

Europe 0257 mdash 0000 0107 mdash 0002 0056

Asia 0211 0000 mdash 0073 mdash 0054

Hispanic 0070 0082 0056 mdash mdash

NCF2 NCF4

Africa mdash 0048 0058 0037 mdash 0128 0136 0158

Europe mdash 0069 0000 mdash 0026 0026

Asia mdash 0059 mdash 0005

Hispanic mdash mdash

aFor CYBB FST estimators are above the diagonal Below the diagonal are the FST values corrected as if the effective population sizes of X chromosome genes were equal toautosomal ones

Table 4 Results of Neutrality Tests for the NADPH Oxidase Genesand Their Significancea

African European Asian Hispanic

Tajimarsquos D

CYBB 0473 0274 0813 0084

CYBA 0580 1242 0684 0707

NCF2 0412 0939 0883 0987

NCF4 0750 0243 0556 0102

Fu and Lirsquos D

CYBB 1050 1110 0395 0977

CYBA 1485 1734 1188 0138

NCF2 0407 1105 1904 1398

NCF4 0308 0443 1893 0266

Fu and Lirsquos F

CYBB 0980 0811 0114 0814

CYBA 1382 1893 1205 0405

NCF2 0512 1252 1893 1567

NCF4 0557 0231 1225 0248

NOTEmdashUnderlined values represent significant results under the demographic modelinferred by Laval et al (2010) for human populations See details in supplementarytable S4 Supplementary Material onlineaThe McDonaldndashKreitman test is nonsignificant in any of the cases

Significant under the WrightndashFisher model of constant population size

2164

Tarazona-Santos et al doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

systemic lupus erythematosus (Cunninghame Graham et al2011) confirm the involvement of NADPH genes in the path-ogenesis of inflammatory-related common diseases Ourclaim that natural selection acted on CYBA (and maybe inNCF2) is relevant for biomedical studies because combiningevidence of natural selection with association analyses inimmune genes increases the statistical power to detect dis-ease-associated variants (Ayodo et al 2007) As a new gener-ation of association studies focusing on rare variation isemerging the combination of genes deemed interestingfrom GWAS and populations with an excess of rare variantsin these genes such as NCF2 in Asians are particularly inter-esting as a source of rare variants with clinical relevanceFinally by determining through molecular evolution mappingthat the extracellular portion of gp91 (encoded by CYBB inthe X-chromosome) has been subject to recurrent episodes ofpositive selection at the scale of mammals evolution we positthe hypothesis that this portion of the NADPH oxidase isrelevant for currently unknown biological processes thatonce revealed by structural and functional investigationswill contribute to understanding the role of NADPH oxidasein infectious autoimmune and cardiovascular diseases

Materials and Methods

Molecular Evolution Analysis of NADPH Genes

We used the maximum likelihood framework developed byYang (2007a) to estimate for the NADPH oxidase genes

under a variety of evolutionary models as implemented in thePAML software (Yang 2007b) This approach allows infer-ences about the evolution of a coding region along aninter-specific phylogeny mapping the codons that haveevolved under strongweak purifying selection neutrality oradaptive positive selection Further details about these anal-yses are available as supplementary material SupplementaryMaterial online

Human Population Genetics of NADPH Genes

For human population genetics analyses we conducted bidi-rectional Sanger sequencing of CYBB CYBA NCF2 and NCF4for a total of 35242 bp for each of 102 healthy individuals aspart of the SNP500 Cancer project (Packer et al 2006 seesupplementary fig S1 Supplementary Material online for de-tails) Human population resequencing data for CYBB werepublished in Tarazona-Santos et al (2008) The resequencingexperiments were performed as in Packer et al (2006) These102 unrelated individuals include the following 24 of Africanancestry (15 African Americans from the United States and 9Pygmies) 23 admixed Latin Americans (from Mexico PuertoRico and South America) 31 Europeans (from the CEPHUTAH pedigree and the NIEHS Environmental GenomeProject) and 24 AsiansndashOceanians (from Melanesia PakistanChina Cambodia Japan and Taiwan)

After controlling for multiple tests we confirmed that allSNPs fit the HardyndashWeinberg equilibrium by the Guo and

CYBA - A16

CYBA - A3

Y72H - rs4673

V174A - rs1049254 CYBA - E18

(a)

NCF2 - E10 H389Q - rs17849502

NCF2 - B3

NCF2 - D11

NCF2 - C1K181R - rs2274064(b)

FIG 4 Phylogenetic networks of (a) CYBA in Europeans and (b) NCF2 in Europeans (black) and Asians (yellow) The lengths of the branches areproportional to the number of mutations Only nonsynonymous mutations are shown The haplotype names correspond to the table of inferredhaplotypes in the supplementary material Supplementary Material online We only show the names of haplotypes with a frequency gt5 Ancestralhaplotypes (inferred as being the human haplotype or median vector most similar to the chimpanzee sequence) are indicated by an arrow Medianvectors are in red

2165

Evolutionary Dynamics of Human NADPH Oxidase Genes doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

Thompson (1992) test which was implemented in the soft-ware Arlequin 30 (Excoffier et al 2007) Insertion-deletions(INDELs) were excluded from population genetics analysesTo assess intrapopulation diversity we used two statistics the per-site mean number of pairwise differences betweensequences (Tajima 1983) and w based on the number ofsegregating sites (S) (Watterson 1975) We measured pairwisebetween-populations diversity by using the FST statistics cal-culated using the software DnaSP (Rozas 2009)

Haplotypes and the recombination parameter were in-ferred using the PHASE software (Stephens and Scheet 2005)and diversity indexes calculations (tables 2 and 3) as well asneutrality statistics (table 4) were estimated using DnaSP soft-ware We applied two kinds of neutrality tests 1) tests basedon the allelic spectrum which is the distribution of polymor-phisms across different classes of frequencies namelyTajimarsquos DT (Tajima 1989) and FundashLirsquos DFL and FFL (Fu andLi 1993) and 2) tests based on comparisons between thenumber of polymorphisms in human populations and fixeddifferences with the chimpanzee (ie outgroup) namely theMcDonald and Kreitman (1991) test and the adaptedKolmogorovndashSmirnoff test by McDonald (1998) For thefirst set of tests we used as null hypotheses both the classicWrightndashFisher model of neutrality with a constant popula-tion size as well as the more realistic evolutionary scenario forhuman populations inferred by Laval et al (2010) In the caseof the scenario of Laval et al (2010) we ignored interconti-nental gene flow within the Old World because these raregene flow events likely does not affect the level of significanceof the neutrality tests given its very low inferred values(13 105) Null distributions used to test the significanceof the neutrality tests under these evolutionary scenarios weregenerated using coalescent simulations and a significancelevel of 005 (Hudson 2002) The KolmogorovndashSmirnoff testof neutrality adapted by McDonald (1998) was performedusing Slider software available at httpudeledu~mcdonaldaboutdnasliderhtml (last accessed July 16 2013) Weperformed coalescent simulations using ms software(Hudson 2002) Further methodological details are availableas supplementary material Supplementary Material onlineWe constructed the CYBA and NCF2 networks using allSNP variants and applying the Median joining algorithmand the maximum parsimony option calculations as imple-mented in the software Network 46 (Bandelt et al 1999)

Supplementary MaterialSupplementary tables S1ndashS5 and figure S1 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

SJC conceived the study ET-S MM FL RC AC CF LPand BS generated the dataanalyzed the sequences LB ACWCSM and MY provided tools for data generationand analyses ET-S MM FL WCSM and RR performedpopulation genetics analyses ET-S and MM prepared tablesand figures ET-S and SJC wrote the manuscript All the

authors contributed with discussions about the results Theauthors are grateful to Silvia Fuselli for discussions of ourresults to Fernando Levi Soares for technical assistance withthe figures and the Sequencing Group of the CoreGenotyping Facility (National Cancer Institute) for their tech-nical assistance This work was supported by the IntramuralResearch Program of the National Cancer Institute (NCI) andNational Institutes of Health (NIH) CF was supported by theUniversity of Bologna ET-S MM WCSM and FL weresupported by the Fogarty International Center and NationalCancer Institute (1R01TW007894) Conselho Nacional deDesenvolvimento Cientıfico e Tecnologico (Brazil Grant3075562012-3) Fundacao de Amparo a Pesquisa de MinasGerais (Brazil CBB PPM 0026512) and the Brazilian Ministryof Education (CAPES Agency grant PNPD 0256709-1)

ReferencesAyodo G Price AL Keinan A Ajwang A Otieno MF Orago AS

Patterson N Reich D 2007 Combining evidence of natural selectionwith association analysis increases power to detect malaria-resis-tance variants Am J Hum Genet 81(2)234ndash242

Bamshad M Wooding SP 2003 Signatures of natural selection in thehuman genome Nat Rev Genet 4(2)99ndash111

Bandelt H-J Forster P Rohl A 1999 Median-joining networks for infer-ring intraspecific phylogenies Mol Biol Evol 16(1)37ndash48

Barreiro LB Ben-Ali M Quach H et al (17 co-authors) 2009Evolutionary dynamics of human Toll-like receptors and their dif-ferent contributions to host defense PLoS Genet 5(7)e1000562

Barreiro LB Quintana-Murci L 2010 From evolutionary genetics tohuman immunology how selection shapes host defence genesNat Rev Genet 11(1)17ndash30

Barrett RD Schluter D 2008 Adaptation from standing genetic varia-tion Trends Ecol Evol 23(1)38ndash44

Bedard K Attar H Bonnefont J Jaquet V Borel C Plastre O Stasia MJAntonarakis SE Krause KH 2009 Three common polymorphisms inthe CYBA gene form a haplotype associated with decreased ROSgeneration Hum Mutat 30(7)1123ndash1133

Blekhman R Man O Herrmann L Boyko AR Indap A Kosiol CBustamante CD Teshima KM Przeworski M 2008 Natural selectionon genes that underlie human disease susceptibility Curr Biol18(12)883ndash889

Brandes RP Kreuzer J 2005 Vascular NADPH oxidases molecular mech-anisms of activation Cardiovasc Res 65(1)16ndash27

Buckley RH 2004 Pulmonary complications of primary immunodefi-ciencies Paediatr Respir Rev 5(Suppl A)S225ndashS233

Bustamante CD Fledel-Alon A Williamson S et al (13 co-authors)2005 Natural selection on protein-coding genes in the humangenome Nature 437(7062)1153ndash1157

Bustamante J Arias AA Vogt G et al (29 co-authors) 2011 GermlineCYBB mutations that selectively affect macrophages in kindredswith X-linked predisposition to tuberculous mycobacterial diseaseNat Immunol 12(3)213ndash221

Campbell MC Tishkoff SA 2008 African genetic diversity implicationsfor human demographic history modern human origins and com-plex disease mapping Annu Rev Genomics Hum Genet 9403ndash433

Chanock SJ Roesler J Zhan S Hopkins P Lee P Barrett DT ChristensenBL Curnutte JT Gorlach A 2000 Genomic structure of the humanp47-phox (NCF1) gene Blood Cells Mol Dis 26(1)37ndash46

Cunninghame Graham DS Morris DL Bhangale TR Criswell LASyvanen AC Ronnblom L Behrens TW Graham RR Vyse TJ2011 Association of NCF2 IKZF1 IRF8 IFIH1 and TYK2 with sys-temic lupus erythematosus PLoS Genet 7(10)e1002341

Excoffier L Laval G Schneider S 2007 Arlequin ver 30 an integratedsoftware package for population genetics data analysis EvolBioinform Online 147ndash50

2166

Tarazona-Santos et al doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

Excoffier L Ray N 2008 Surfing during population expansions promotesgenetic revolutions and structuration Trends Ecol Evol23(7)347ndash351

Ferrer-Admetlla A Bosch E Sikora M Marques-Bonet T Ramırez-SorianoA Muntasell A Navarro A Lazarus R Calafell F Bertranpetit J Casals F2008 Balancing selection is the main force shaping the evolution ofinnate immunity genes J Immunol 181(2)1315ndash1322

Fu YX Li WH 1993 Statistical tests of neutrality of mutations Genetics133(3)693ndash709

The 1000 Genomes Project Consortium 2010 A map of humangenome variation from population-scale sequencing Nature467(7319)1061ndash1073

The 1000 Genomes Project Consortium Abecasis GR Auton A BrooksLD DePristo MA Durbin RM Handsaker RE Kang HM Marth GTMcVean GA 2012 An integrated map of genetic variation from1092 human genomes Nature 491(7422)56ndash65

Guo SW Thompson EA 1992 Performing the exact test of Hardy-Weinberg proportion for multiple alleles Biometrics 48(2)361ndash372

Heyworth PG Cross AR Curnutte JT 2003 Chronic granulomatousdisease Curr Opin Immunol 15(5)578ndash584

Hudson RR 2002 Generating samples under a Wright-Fisher neutralmodel of genetic variation Bioinformatics 18(2)337ndash338

Jacob CO Eisenstein M Dinauer MC et al (21 co-authors) 2012 Lupus-associated causal mutation in neutrophil cytosolic factor 2 (NCF2)brings unique insights to the structure and function of NADPHoxidase Proc Natl Acad Sci U S A 109(2)E59ndashE67

Kimura M 1974 The neutral theory of molecular evolution CambridgeCambridge University Press

Kosiol C Vinar T da Fonseca RR Hubisz MJ Bustamante CD Nielsen RSiepel A 2008 Patterns of positive selection in six mammalian ge-nomes PLoS Genet 4(8)e1000144

Lacy F Kailasam MT OrsquoConnor DT Schmid-Schonbein GW Parmer RJ2000 Plasma hydrogen peroxide production in human essentialhypertension role of heredity gender and ethnicity Hypertension36(5)878ndash884

Laval G Patin E Barreiro LB Quintana-Murci L 2010 Formulating a his-torical and demographic model of recent human evolution based onresequencing data from noncoding regions PLoS One 5(4)e10284

Lewontin R 1972 The apportionment of human diversity Evol Biol 6391ndash398

Lindblad-Toh K Garber M Zuk O et al (88 co-authors) 2011 A high-resolution map of human evolutionary constraint using 29 mam-mals Nature 478(7370)476ndash482

Lodge R Diallo TO Descoteaux A 2006 Leishmania donovani lipopho-sphoglycan blocks NADPH oxidase assembly at the phagosomemembrane Cell Microbiol 8(12)1922ndash1931

Magalhaes WC Rodrigues MR Silva D Soares-Souza G Iannini MLCerqueira GC Faria-Campos AC Tarazona-Santos E 2012DIVERGENOME a bioinformatics platform to assist population ge-netics and genetic epidemiology studies Genet Epidemiol36(4)360ndash367

Marth JD Grewal PK 2008 Mammalian glycosylation in immunity NatRev Immunol 8(11)874ndash887

McDonald JH 1998 Improved tests for heterogeneity across a region ofDNA sequence in the ratio of polymorphism to divergence Mol BiolEvol 15377ndash384

McDonald JH Kreitman M 1991 Adaptive protein evolution at the Adhlocus in Drosophila Nature 351(6328)652ndash654

Nielsen R Bustamante C Clark AG et al (12 co-authors) 2005 A scanfor positively selected genes in the genomes of humans and chim-panzees PLoS Biol 3(6)e170

Olsson LM Lindqvist AK Kallberg H Padyukov L Burkhardt HAlfredsson L Klareskog L Holmdahl R 2007 A case-control studyof rheumatoid arthritis identifies an associated single nucleotidepolymorphism in the NCF4 gene supporting a role for the

NADPH-oxidase complex in autoimmunity Arthritis Res Ther9(5)R98

Packer BR Yeager M Burdett L et al (13 co-authors) 2006SNP500Cancer a public resource for sequence validation assay de-velopment and frequency analysis for genetic variation in candidategenes Nucleic Acids Res 34(Database issue)D617ndashD621

Peter BM Huerta-Sanchez E Nielsen R 2012 Distinguishing betweenselective sweeps from standing variation and from a de novo mu-tation PLoS Genet 8(10)e1003011

Przeworski M Coop G Wall JD 2005 The signature of positive selectionon standing genetic variation Evolution 59(11)2312ndash2323

Ptak SE Przeworski M 2002 Evidence for population growth in humansis confounded by fine-scale population structure Trends Genet18(11)559ndash563

Ramensky V Bork P Sunyaev S 2002 Human non-synonymous SNPsserver and survey Nucleic Acids Res 30(17)3894ndash3900

Rioux JD Xavier RJ Taylor KD et al (24 co-authors) 2007 Genome-wideassociation study identifies new susceptibility loci for Crohn diseaseand implicates autophagy in disease pathogenesis Nat Genet39(5)596ndash604

Roberts RL Hollis-Moffatt JE Gearry RB Kennedy MA Barclay MLMerriman TR 2008 Confirmation of association of IRGM andNCF4 with ileal Crohnrsquos disease in a population-based cohortGenes Immun 9(6)561ndash565

Rozas J 2009 DNA sequence polymorphism analysis using DnaSPMethods Mol Biol 537337ndash350

San Jose G Fortuno A Beloqui O Dıez J Zalba G 2008 NADPH oxidaseCYBA polymorphisms oxidative stress and cardiovascular diseasesClin Sci (Lond) 114(3)173ndash182

Santiago HC Gonzalez Lombana CZ Macedo JP et al (12 co-authors)2012 NADPH phagocyte oxidase knockout mice controlTrypanosoma cruzi proliferation but develop circulatory collapseand succumb to infection PLoS Negl Trop Dis 6(2)e1492

Stephens M Scheet P 2005 Accounting for decay of linkage disequilib-rium in haplotype inference and missing-data imputation Am JHum Genet 76(3)449ndash462

Sumimoto H Miyano K Takeya R 2005 Molecular composition andregulation of the Nox family NAD(P)H oxidases Biochem BiophysRes Commun 338(1)677ndash686

Tajima F 1983 Evolutionary relationship of DNA sequences in finitepopulations Genetics 105(2)437ndash460

Tajima F 1989 Statistical method for testing the neutral mutationhypothesis by DNA polymorphism Genetics 123(3)585ndash595

Tarazona-Santos E Bernig T Burdett L Magalhaes WC Fabbri C Liao JRedondo RA Welch R Yeager M Chanock SJ 2008 CYBB anNADPH-oxidase gene restricted diversity in humans and evidencefor differential long-term purifying selection on transmembrane andcytosolic domains Hum Mutat 29(5)623ndash632

Taylor RM Burritt JB Baniulis D Foubert TR Lord CI Dinauer MCParkos CA Jesaitis AJ 2004 Site-specific inhibitors of NADPH oxi-dase activity and structural probes of flavocytochrome b character-ization of six monoclonal antibodies to the p22phox subunitJ Immunol 173(12)7349ndash7357

Voight BF Kudaravalli S Wen X Pritchard JK 2006 A map of recentpositive selection in the human genome PLoS Biol 4(3)e72

Watterson GA 1975 On the number of segregating sites in geneticalmodels without recombination Theor Popul Biol 7(2) 256ndash276

Williamson SH Hernandez R Fledel-Alon A Zhu L Nielsen RBustamante CD 2005 Simultaneous inference of selection and pop-ulation growth from patterns of variation in the human genomeProc Natl Acad Sci U S A 102(22)7882ndash7887

Yang Z 2007a Adaptive molecular evolution In Balding DJ Bishop MCannings C editors Handbook of statistical genetics Vol 1 3rd edSusex (United Kingdom) John Wiley amp Sons p 377ndash406

Yang Z 2007b PAML 4 phylogenetic analysis by maximum likelihoodMol Biol Evol 241586ndash1591

2167

Evolutionary Dynamics of Human NADPH Oxidase Genes doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

selection maintains the substitutions at low frequencies andprevents fixation at the same rate as synonymous substitu-tions resulting in dNlt dS and lt 1 On the other hand ifepisodes of positive natural selection (that raise the frequencyof beneficial variants) are frequent nonsynonymous substi-tutions increase in frequency and fix more rapidly than neu-tral synonymous substitutions thus dNgt dS and gt 1 Weused the maximum likelihood framework developed by Yang(2007a) to estimate for the NADPH oxidase genes under avariety of evolutionary models as implemented in the PAMLsoftware (Yang 2007b) This approach allows inferences aboutthe evolution of a coding region along an interspecific phy-logeny and maps the codons that have evolved under strongweak purifying selection neutrality or adaptive positive se-lection (see supplementary material Supplementary Materialonline for details)

In general PAML evolutionary models that allow a combi-nation of purifying selection and neutrality are reasonably re-alistic These models are nested with respect to models thatalso incorporate positive selection at the cost of adding newparameters We evaluated the improvements in the goodnessof fit of the data using the latter model with respect to theformer models by applying a likelihood ratio test (LRT) Afterfitting the data to the most appropriate evolutionary modelNaive Empirical Bayes (NEB) or Bayes Empirical Bayes (BEB)approaches were used to infer theparameter for each codon

For the 29 species of mammals considered we filteredbased on quality control (supplementary table S1 Supple-mentary Material online) and analyzed 570 codons of CYBBin 26 species 198 codons of CYBA in 16 species 526 codons ofNCF2 in 23 species and 339 codons of NCF4 in 20 species (seesupplementary material Supplementary Material online forfurther details including the species and sequences used forthe analyses the parameter estimations for the differentmodels and the LRT results) Here we only present the resultsof model M3 of Yang (2007a) with three (K = 3) classes of (fig 2) This model allows for different classes including thepossibility of positive selection and is a reasonable way ofpresenting the results for the four genes under the samemodel Moreover in all cases the data fit better withmodel M3 than the nested and simpler M0 or M2 modelspresented by Yang (2007a)

The results of this analysis for CYBB CYBA NCF2 and NCF4are presented in figure 2 which shows the type of naturalselection (ie based on the estimated ) for each codon thatmost likely predominated during mammalian evolution Forthis temporal scale CYBA NCF2 and NCF4 coding regionshave evolved driven by a combination of different levels ofpurifying natural selection Overall the average and standarddeviation values for these genes are NCF2 = 0256 plusmn 0227NCF4 = 0126 plusmn 0140 and CYBA = 0109 plusmn 0116

Our most striking result is for CYBB which presents a widespectrum of mutations that account for gt70 of CGD pa-tients Although we would predict that purifying selection ongenes involved in Mendelian diseases (Blekhman et al 2008)would yield similar results for CYBB and other NADPHcomponents we observed a different pattern In generalCYBB is a conserved gene but 6 of its codons show

evidence of positive natural selection (supplementary tableS2 Supplementary Material online fig 2) In a genome-widesurvey performed by Kosiol et al (2008) they have reportedCYBB as a gene showing a signal of positive natural selectionMore importantly by evolutionary mapping we show herefor the first time that most of these positive selection eventsmap to the small extracellular portion of this protein (fig 3)The proximity of these inferred episodes of positive naturalselection to glycosylation sites in gp91 is noteworthy consid-ering the importance of the glycome in immunity (Marth andGrewal 2008)

Population Genetics of NADPH Genes

We sequenced CYBB CYBA NCF2 and NCF4 in a publiclyavailable panel that includes 24 individuals of African ances-try 31 Europeans 24 AsianOceanians and 23 admixed LatinAmericans (ie Hispanics) This panel is a suboptimal repre-sentation of the worldwide population a limitation that iscommon to most human genomic diversity projects focusedon SNP genotyping or resequencing efforts However basedon how human genetic diversity is apportioned within(gt85) and between (lt15) populations (Lewontin 19721000 Genomes Project Consortium 2010) even studies usingsuboptimal sampling are informative about the genetic struc-ture of human populations and serve to critically identify therole of evolutionary factors in human genetic diversity(Kimura 1974 Nielsen et al 2005 1000 Genomes ProjectConsortium 2010) All the raw results are available as supple-mentary material Supplementary Material online and at theSNP500Cancer project homepage (httpvariantgpsncinihgovcgfseqpagessnp500do last accessed July 16 2013) orcan be downloaded from the DIVERGENOME platform(Magalhaes et al 2012 httpwwwpggeneticaicbufmgbrdivergenome last accessed July 16 2013)

To ascertain which combination of evolutionary factorshas shaped the diversity of NADPH genes we assessed thepattern of nonsynonymous and synonymous polymor-phisms as well as intra- and interpopulation diversity forNADPH genes and tested the null hypothesis of neutralitythat patterns of diversity may be explained by consideringonly the demographic history of human populations and themutation and recombination rates of each locus

Nonsynonymous polymorphisms are underrepresentedin the human genome and usually occur at low frequencieswhen present reflecting the action of purifying natural selec-tion (1000 Genomes Project Consortium 2010) By resequen-cing Tarazona-Santos et al (2008) did not observe commonnonsynonymous polymorphisms for CYBB This result is con-sistent with purifying natural selection acting on X-chromo-some genes due to the exposure of deleterious recessivemutations to natural selection in hemizygous males Thussubstitutions in the coding region of CYBB should be rarein human populations and are seldom captured by studieswith small sample sizes Interestingly the lack of CYBBnonsynonymous polymorphisms in our sample of humanpopulations contrasts with the recurrent episodes of positiveselection of the extracellular portion of gp91 during

2159

Evolutionary Dynamics of Human NADPH Oxidase Genes doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

mammalian evolution as inferred in this study For the auto-somal NADPH oxidase components (table 1 and haplotypetables online Supplementary Material online) we observe inthis study two rare and conservative nonsynonymous substi-tutions (ie involving amino acids with similar chemical prop-erties T85N and A304E) in NCF4 But for NCF2 and CYBA weobserved patterns of nonsynonymous substitutions thatseldom occur in human genes NCF2 has nine nonsynony-mous substitutions three of them are common (with a fre-quency higher than 5 in at least one of the studiedpopulation samples) and six are rare On the other handtwo nonsynonymous substitutions in CYBA are commonand ubiquitous in human populations namely Y72H(rs4673) and V174A (rs1049254 in a position where variationamong mammalian species is also observed) Moreover thefollowing two of the five common amino acid changes ob-served in the autosomal NADPH genes are predicted to bepossibly damaging (ie radical) by the Polyphen resource(Ramensky et al 2002) the two changes are R395W inHispanic NCF2 (rs13306575) and Y72H (rs4673) in CYBA Ingeneral Polyphen accurately predicts the effect of nonsynon-ymous substitutions based on biochemical and evolutionarydata (Williamson et al 2005) Notably the ImmunodeficiencyMutations Database (httpbioinfutafiNCF2basecontent=pubIDbases last accessed July 16 2013) reports one395W395W autosomal recessive CGD patient but we andthe HapMap project (wwwhapmaporg last accessed July 16

2013) observed the W allele at frequencies between 5 and10 in Asians and admixed Latin Americans including onesupposedly healthy 395W395W Japanese HapMap individ-ual On the basis of these results we verified whether NativeAmericans who descend from an ancestral Pleistocene Asianpopulations that peopled the Americas by the Behring Straitsmore than 14000 years ago may have relatively higher fre-quencies of this variant We genotyped the variant 395Wusing a Taqman assay in 558 Native Americans (see supple-mentary table S3 Supplementary Material online for detailedresults) and observed an allele frequency of 12 being thisvariant always present in heterozygous individuals

For the four studied genes the diversity and levels of re-combination are higher in Africans than in non-Africans(table 2 and haplotype tables available as supplementaryfiles Supplementary Material online) This result is a conse-quence of the African origin of modern humans and the ldquooutof Africardquo migration that occurred 40000ndash80000 years agoafter a bottleneck leading to the peopling of other continents(Campbell and Tishkoff 2008 Laval et al 2010) Therefore thefirst divergence between continental human populations wasbetween Africans and ancestral non-Africans Consistentlywith this scenario we observed the highest between-popula-tion differentiation for CYBB CYBA and NCF4 between thesetwo groups Interestingly NCF2 does not match this pattern(table 3)

FIG 2 Inferred types of natural selection for codons of the NADPH genes at the evolutionary time scale of mammals Codons are represented along thehorizontal axis For each gene three classes of sites (black dark gray and light gray) are considered and each class evolved under different inferred values (presented for each gene in the figure at the top of each graphic) These classes correspond to the model M3 of Yang (2007a) with three classes ofsites Given our data this model is more likely than alternative models of evolution that assume simpler scenarios such as a unique for the entire gene(see supplementary material Supplementary Material online for details regarding methods and results using alternative models) The three classescorrespond to different types and levels of natural selection from strong purifying selection (in the lightest gray) to positive selection (gt 1) For eachcodon the probability of belonging to each of the three classes of corresponds to the height of the corresponding color in the vertical bar Forexample codon 173 of CYBB (indicated by a white arrow) has a 0000 probability of belonging to the= 00095 class (light gray a class corresponding tostrong purifying selection) a 0169 probability of belonging to the = 03085 class (dark gray) and a 0831 probability of belonging to the = 18987class (black a class that suggests positive selection) In this case reasonable evidence of positive selection on this codon exists

2160

Tarazona-Santos et al doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

FIG

3

Nat

ural

sele

ctio

nm

app

ing

acro

ssC

YBB

(en

codi

ng

gp91

)al

ong

mam

mal

ian

evol

utio

na

sid

enti

fied

usin

gth

ePA

ML

met

hod

byY

ang

(200

7a)

The

top

olog

ies

ofgp

91an

dp2

2ar

ere

pro

duce

dfr

omT

aylo

ret

al(

2004

Cop

yrig

ht20

04T

heA

mer

ican

Ass

ocia

tion

ofIm

mun

olog

ists

In

cU

sed

wit

hp

erm

issi

on)

Dar

kgr

ayam

ino

acid

sha

veev

olve

dun

der

pos

itiv

ese

lect

ion

wit

hgt

80

pro

babi

lity

Mos

tof

thes

eam

ino

acid

sar

ein

the

extr

acel

lula

rp

orti

onof

the

pro

tein

The

upp

erp

art

ofth

efig

ure

show

sth

ep

rote

inal

ign

men

tfo

rn

ine

mam

mal

sof

the

gp91

regi

onin

dica

ted

byth

ebl

ack

ellip

seI

nth

isre

gion

ahi

ghle

velo

fam

ino

acid

varia

tion

isfo

und

betw

een

spec

ies

and

seve

ralc

odon

ssh

owgt

1In

this

alig

nm

ent

gray

vert

ical

bars

corr

esp

ond

tova

riab

leam

ino

acid

site

sT

hep

rote

inal

ign

men

tof

mam

mal

ssh

ows

the

follo

win

gsp

ecie

sH

om(H

omo

sapi

ens

sapi

ens)

Pan

(Pan

trog

lody

tes)

Mac

(Mac

aca

mul

atta

)M

us(M

usm

uscu

lus)

Rat

(Rat

tus

norv

egic

us)

Cri

(Cri

cetu

lus

gris

eus)

Het

(Het

eroc

epha

lus

glab

er)

Cav

(Cav

iapo

rcel

lus)

an

dO

ry(O

ryct

olag

uscu

nicu

lus)

EC

ext

race

lula

ren

viro

nm

ent

TM

tra

nsm

embr

ane

laye

ran

dIC

in

trac

ellu

lar

envi

ron

men

t

2161

Evolutionary Dynamics of Human NADPH Oxidase Genes doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

From the four studied genes NCF4 which encodes theregulatory protein p40-phox shows a pattern of diversitythat is typical for a gene that has evolved under neutralityIn addition to the features described in the previous para-graph the allelic spectra of NCF4 in the studied populationsare consistent with a neutral model of evolution (tables 2ndash4)

Although CYBB presented the most interesting evolution-ary history at the interspecific level with repeated episodes ofpositive natural selection recent human evolutionary historyhas resulted in interesting patterns of variation for CYBA andNCF2 CYBA encodes p22-phox which is a transmembraneprotein shared by different NADPH oxidases In addition toharboring two common nonsynonymous polymorphisms(V174A is also variable among different species of mammals)CYBA is the most variable and most affected by recombina-tion among the NADPH oxidase genes (tables 1 and 2) Inparticular CYBA diversity is very high in Europe comparedwith 329 genes resequenced in a European sample (httppgagswashingtonedusummary_statshtml last accessed July16 2013) CYBA ranks 11th (ie the 97th percentile)Moreover there are contrasting proportions of the totalnumber of polymorphismsnumber of singletons betweenAfricans (a low proportion) and Europeans (a high propor-tion) the latter showing an excess of common polymor-phisms with respect to the neutral expectation (see the DFL

test in table 4 and supplementary table S4 SupplementaryMaterial online) This excess of common variants inEuropeans is also significant when we conservatively testedit against a scenario of human evolution that incorporates theldquoOut of Africardquo bottleneck (Laval et al 2010) and the observedlevel of recombination in Europeans (CYBA = 807 for the se-quenced region) Because demographic forces and recombi-nation levels alone do not explain the high CYBA diversity andits excess of common polymorphisms we suggest that bal-ancing natural selection (that acts by maintaining differentalleles at high frequency in a population) has contributed to

shape the diversity of CYBA at least in the European popu-lation Indeed figure 4a shows that the haplotype network ofCYBA for the European population is consistent with theaction of balancing natural selection (Bamshad andWooding 2003) showing two well-differentiated commonclades that explain the observed high diversity and theexcess of common CYBA variants A comparative genomicanalysis confirms this inference the ratio of polymorphismsto differences fixed between human and chimpanzee is nothomogeneous along the gene in the different human popu-lations (Mc Donald 1998 supplementary table S5Supplementary Material online) as would be expectedunder neutral evolution Our inference of balancing naturalselection is consistent with the fact that 25ndash30 of the var-iation in levels of ROS production can be attributed to geneticfactors (Lacy et al 2000) and that ROS levels are associatedwith CYBA variants (Bedard et al 2009)

p67 encoded by NCF2 is a necessary cytosolic NADPHcomponent for phagocyte ROS production Asians show ahighly differentiated NCF2 haplotype structure (see frequen-cies of haplotypes NCF2-D11 and NCF2-E10 in the haplotypetables online and in the network shown in fig 4b) and thehighest FST values are observed in pairwise comparisons be-tween Asians and non-Asian populations (in particular withEuropeans table 3) and not between Africans and non-African populations as is usually observed in the humangenome We confirmed these results by analyzing data forNCF2 from the HapMap Project (supplementary materialSupplementary Material online) Moreover a trend towardan excess of rare polymorphisms exists in Asians that is notobserved elsewhere (tables 1 2 and 4 DFL =1904FFL =1893) Although we cannot exclude that this patternof diversity is compatible with the null hypotheses of neutral-ity and with the tested demographic history of human pop-ulations inferred by Laval et al (2010 tables 2ndash4) we canspeculate and envisage four additional evolutionary scenarios

Table 1 Allele Frequencies of Nonsynonymous Polymorphisms in NADPH Oxidase Genes

Genes rs Minor Allele(Amino Acid)

PolyphenPrediction

African European Asian Hispanic

CYBA

Y72H rs4673 T (Y) Possibly damaging 046 032 017 022

V174A rs1049254 T (V) Benign 017 048 048 018

NCF2

K181R rs2274064 G (R) Benign 035 037 041 048

T279M rs13306581 T (T) Probably damaging 000 000 005 000

V297A rs35937854 C (A) Benign 004 000 000 000

T361S Chr1181799289 NCBI36hg18 T (S) mdash 000 000 002 000

H389Q rs17849502 A (Q) Benign 000 005 000 007

R395W rs13306575 T (W) Possibly damaging 000 000 000 007

N419I rs35012521 T (I) Probably damaging 000 002 004 000

P454S rs55761650 T (S) mdash 000 000 000 002

L487S Chr1181795862 NCBI36hg18 C (S) mdash 002 000 000 000

NCF4

T85N rs112306225 A (N) mdash 000 000 002 000

A304E rs5995361 A (E) Benign 004 000 000 000

2162

Tarazona-Santos et al doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

that may have contributed to shape the pattern of NCF2diversity in Asians 1) the observed trend is suggestive of aselective sweep on NCF2 standing variation a neutral orweakly deleterious existing variant becomes beneficial andrapidly increases in frequency (together with its associatedhaplotypes ie incomplete sweep) reducing the nucleotidediversity in the surrounding region and rendering other stand-ing substitutions rare During this process new rare substitu-tions appear in the expanding positively selected haplotype2) The pattern of diversity of NCF2 in Asia may result from anincomplete selective sweep acting on ARPC5 which is locatedapproximately 35 kb downstream of NCF2 In a genome-widescan for recent positive selection Voight et al (2006) identi-fied a strong signature of an incomplete sweep for ARPC5 inAsia (P = 0009) characterized by a higher than expected long-range linkage disequilibrium summarized by very high iHS

statistics (see Haplotter results for the HapMap II data athttphaplotteruchicagoedu last accessed July 16 2013)SNPs in NCF2 also presents high iHS statistics (P = 002) al-though values are lower than for ARPC5 3) The differentiatedpattern of diversity of NCF2 in Asia may also have been gen-erated without the action of natural selection during the firstcolonization of Asia by modern humans In a process of geo-graphic population expansion specifically in the front wave ofthe expansion some rare alleleshaplotypes (ie surfing al-leles) may become common by chance mimicking the pat-tern of diversity generated by a selective sweep (Excoffier andRay 2008) 4) The excess of rare variants may be an artifact ofpooling individuals from different populations (Ptak andPrzeworski 2002) Consistent with evolutionary scenarios 1ndash4 that produce similar patterns of diversity the haplotypenetwork of NCF2 for Eurasians (fig 4b) shows the following1) a large differentiation between Asians and Europeans thatis compatible with the high observed FST values and 2) a star-like shape associated with the haplotype NCF2-E10 that iscommon in Asia and rare elsewhere which is compatiblewith the excess of rare alleles in the Asian populations

DiscussionBy analyzing 29 mammalian genomes and four human pop-ulations we show in this study that natural selection hasacted in different ways over time to shape the pattern ofdiversity of the phagocyte NADPH oxidase genes At the tem-poral scale of the evolution of mammals we have inferredrecurrent episodes of positive selection acting on the extra-cellular portion of gp-91 that have been important to shapethe pattern of interspecific diversity of this gene Our in-terspecific analyses did not show a similar pattern of naturalselection in any of the other phagocyte NADPH oxidasegenes Even if current knowledge on the biology of NADPHdoes not allow us to interpret our results in terms of functionwe propose that the extracellular region of gp-91 is function-ally relevant Our results also imply that this region is highlydifferentiated among mammals at the protein level and thisvariability should be considered when mammals models areused to study the structure and function of phagocyteNADPH components

In the time scale of human evolution our analyses of theNADPH oxidase genes suggest that CYBA has been a target ofbalancing natural selection Because we do not have evidenceof population-specific variants that faced selective pressurethe inferred natural selection may have acted on a standingvariation in ancestral populations This implies that the selec-tive pressure began after the appearance of the variant andpossibly acted in a specific geographic region (Barret andSchluter 2008) The signatures of natural selection acting ona new mutation and on standing variation differ In the case ofselective sweeps episodes of natural selection on standingvariation are associated to a larger variance in the allelic spec-trum with respect to natural selection on a new mutationAlso selection on standing variation may produce an excess ofalleles at intermediate frequencies that is not associated withhigh nucleotide diversity (Przeworski et al 2005 Peter et al2012) This pattern contrasts with the effect of balancing

Table 2 Intrapopulation Diversity Indexes in the Studied Populationsfor the NADPH Oxidase Genes Obtained from Resequencing Dataa

African European Asian Hispanic

Number of chromosomes

CYBBb 42 52 34 36

CYBA 48 62 48 46

NCF2 48 62 48 46

NCF4 48 62 48 46

Segregating sitessingletons

CYBB 218 70 103 135

CYBA 6122 333 335 347

NCF2 4613 3311 2816 3714

NCF4 4512 267 191 308

Haplotype structureNumber of inferred haplotypesc

CYBB 14 5 7 12

CYBA 39 39 32 26

NCF2 38 32 18 31

NCF4 36 30 17 22

Haplotype diversity plusmn SD

CYBB 088 plusmn 003 034 plusmn 008 053 plusmn 010 070 plusmn 008

CYBA 098 plusmn 001 098 plusmn 001 097 plusmn 000 096 plusmn 002

NCF2 099 plusmn 001 096 plusmn 001 087 plusmn 004 097 plusmn 001

NCF4 098 plusmn 001 093 plusmn 002 093 plusmn 002 093 plusmn 002

Recombination parameter (q 103 per site)c

CYBB 008 lt001 lt001 lt002

CYBA 291 148 096 088

NCF2 052 038 010 058

NCF4 137 103 039 022

h estimatorsd

p 103 per site

CYBB 036 012 015 028

CYBA 190 163 151 157

NCF2 081 047 043 057

NCF4 101 076 064 084

hW 103 per site

CYBB 042 013 021 027

CYBA 231 118 125 13

NCF2 102 069 062 083

NCF4 101 070 054 087

aMost analyses were performed using software DnaSP (Rozas 2009)bData for CYBB (Xp211) are from Tarazona-Santos et al (2008)cHaplotypes and inferred using the method by Stephens and Scheet (2005) andthe software PHASEd Tajima (1983) W Watterson (1975)

2163

Evolutionary Dynamics of Human NADPH Oxidase Genes doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

natural selection which produces an excess of common allelesassociated with high genetic diversity Thus the observed pat-tern of CYBA diversity in Europeans is not consistent with aselective sweep on a standing variation but it is consistent witha scenario of balancing selection acting on standing variation

If we consider for CYBA that heterozygote advantage maybe the mechanisms of balancing selection we can speculatethat the biological basis for this mechanism may be the fol-lowing considering that p22-phox is not exclusive of thephagocyte NADPH oxidase but it is also part of Nox com-plexes expressed in other tissues the dependence of ROSproduction on CYBA variants has to be finely regulated IfCYBA variants induce high levels of ROS these variants mayfavor a phagocyte-dependent efficient response to pathogensbut may damage other endothelial tissues Alternativelytissue oxidative damage does not occur if ROS productionis low but this response may be associated with a weaker

phagocyte respiratory burst against pathogens In this con-text heterozygote individuals with a CYBA-dependent inter-mediate level of ROS production may have been favored bynatural selection

Our results contribute to the discussion regarding the rel-evance of balancing selection in shaping the diversity ofinnate immunity genes (Ferrer-Admetlla et al 2008 Barreiroet al 2009) Ferrer-Admetlla et al (2008) have associated therecurrent signatures of balancing selection on inflammatorygenes with the need for fine regulation CYBA is the onlyphagocytic NADPH oxidase gene that also encodesnonphagocyte Nox components thus CYBA has a role inROS cell signaling a potentially dangerous process due toits capability to produce oxidative damage to tissues Geneswith these characteristics likely need even tighter regulationThe interplay between pathogen-driven selective pressureon innate immunity genes and their concomitantnonimmunological functions is complex In addition toCYBA other interesting examples of this interplay can befound among the 10 human toll-like receptors (TLRs) thatshow a variety of signatures of natural selection TLR8 whichshows the strongest signature of purifying selection is alsoinvolved in neuronal development (Barreiro et al 2009) and itis difficult to discriminate the role of each function in deter-mining the observed signature of natural selection

With few exceptions the pathogens responsible for naturalselection on immune genes are difficult to specify In the caseof NADPH oxidase we can infer based on the spectrum ofinfections in CGD patients that catalase-positive bacteria andfungi such as Staphylococci Salmonella Candida Aspergillusand M tuberculosis may be the selective pathogensInteractions between the host and pathogens also includemechanisms of the latter to impair the respiratory burst ofthe former For example Leishmania donovani blocks the as-sembly of NADPH oxidase at the phagosome membrane(Lodge et al 2006) These mechanisms may constitute selec-tive pressures imposed by pathogens

The associations reported in GWAS between rs4821544 inNCF4 and Crohnrsquos disease (an idiopathic inflammatory boweldisease that predominantly involves the ileum and colonRioux et al 2007) and between rs10911363 in NCF2 and

Table 3 Pairwise FST Genetic Distances between Populations

CYBBa CYBA

Africa Europe Asia Hispanic Africa Europe Asia Hispanic

Africa mdash 0316 0264 0092 mdash 0074 0083 0065

Europe 0257 mdash 0000 0107 mdash 0002 0056

Asia 0211 0000 mdash 0073 mdash 0054

Hispanic 0070 0082 0056 mdash mdash

NCF2 NCF4

Africa mdash 0048 0058 0037 mdash 0128 0136 0158

Europe mdash 0069 0000 mdash 0026 0026

Asia mdash 0059 mdash 0005

Hispanic mdash mdash

aFor CYBB FST estimators are above the diagonal Below the diagonal are the FST values corrected as if the effective population sizes of X chromosome genes were equal toautosomal ones

Table 4 Results of Neutrality Tests for the NADPH Oxidase Genesand Their Significancea

African European Asian Hispanic

Tajimarsquos D

CYBB 0473 0274 0813 0084

CYBA 0580 1242 0684 0707

NCF2 0412 0939 0883 0987

NCF4 0750 0243 0556 0102

Fu and Lirsquos D

CYBB 1050 1110 0395 0977

CYBA 1485 1734 1188 0138

NCF2 0407 1105 1904 1398

NCF4 0308 0443 1893 0266

Fu and Lirsquos F

CYBB 0980 0811 0114 0814

CYBA 1382 1893 1205 0405

NCF2 0512 1252 1893 1567

NCF4 0557 0231 1225 0248

NOTEmdashUnderlined values represent significant results under the demographic modelinferred by Laval et al (2010) for human populations See details in supplementarytable S4 Supplementary Material onlineaThe McDonaldndashKreitman test is nonsignificant in any of the cases

Significant under the WrightndashFisher model of constant population size

2164

Tarazona-Santos et al doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

systemic lupus erythematosus (Cunninghame Graham et al2011) confirm the involvement of NADPH genes in the path-ogenesis of inflammatory-related common diseases Ourclaim that natural selection acted on CYBA (and maybe inNCF2) is relevant for biomedical studies because combiningevidence of natural selection with association analyses inimmune genes increases the statistical power to detect dis-ease-associated variants (Ayodo et al 2007) As a new gener-ation of association studies focusing on rare variation isemerging the combination of genes deemed interestingfrom GWAS and populations with an excess of rare variantsin these genes such as NCF2 in Asians are particularly inter-esting as a source of rare variants with clinical relevanceFinally by determining through molecular evolution mappingthat the extracellular portion of gp91 (encoded by CYBB inthe X-chromosome) has been subject to recurrent episodes ofpositive selection at the scale of mammals evolution we positthe hypothesis that this portion of the NADPH oxidase isrelevant for currently unknown biological processes thatonce revealed by structural and functional investigationswill contribute to understanding the role of NADPH oxidasein infectious autoimmune and cardiovascular diseases

Materials and Methods

Molecular Evolution Analysis of NADPH Genes

We used the maximum likelihood framework developed byYang (2007a) to estimate for the NADPH oxidase genes

under a variety of evolutionary models as implemented in thePAML software (Yang 2007b) This approach allows infer-ences about the evolution of a coding region along aninter-specific phylogeny mapping the codons that haveevolved under strongweak purifying selection neutrality oradaptive positive selection Further details about these anal-yses are available as supplementary material SupplementaryMaterial online

Human Population Genetics of NADPH Genes

For human population genetics analyses we conducted bidi-rectional Sanger sequencing of CYBB CYBA NCF2 and NCF4for a total of 35242 bp for each of 102 healthy individuals aspart of the SNP500 Cancer project (Packer et al 2006 seesupplementary fig S1 Supplementary Material online for de-tails) Human population resequencing data for CYBB werepublished in Tarazona-Santos et al (2008) The resequencingexperiments were performed as in Packer et al (2006) These102 unrelated individuals include the following 24 of Africanancestry (15 African Americans from the United States and 9Pygmies) 23 admixed Latin Americans (from Mexico PuertoRico and South America) 31 Europeans (from the CEPHUTAH pedigree and the NIEHS Environmental GenomeProject) and 24 AsiansndashOceanians (from Melanesia PakistanChina Cambodia Japan and Taiwan)

After controlling for multiple tests we confirmed that allSNPs fit the HardyndashWeinberg equilibrium by the Guo and

CYBA - A16

CYBA - A3

Y72H - rs4673

V174A - rs1049254 CYBA - E18

(a)

NCF2 - E10 H389Q - rs17849502

NCF2 - B3

NCF2 - D11

NCF2 - C1K181R - rs2274064(b)

FIG 4 Phylogenetic networks of (a) CYBA in Europeans and (b) NCF2 in Europeans (black) and Asians (yellow) The lengths of the branches areproportional to the number of mutations Only nonsynonymous mutations are shown The haplotype names correspond to the table of inferredhaplotypes in the supplementary material Supplementary Material online We only show the names of haplotypes with a frequency gt5 Ancestralhaplotypes (inferred as being the human haplotype or median vector most similar to the chimpanzee sequence) are indicated by an arrow Medianvectors are in red

2165

Evolutionary Dynamics of Human NADPH Oxidase Genes doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

Thompson (1992) test which was implemented in the soft-ware Arlequin 30 (Excoffier et al 2007) Insertion-deletions(INDELs) were excluded from population genetics analysesTo assess intrapopulation diversity we used two statistics the per-site mean number of pairwise differences betweensequences (Tajima 1983) and w based on the number ofsegregating sites (S) (Watterson 1975) We measured pairwisebetween-populations diversity by using the FST statistics cal-culated using the software DnaSP (Rozas 2009)

Haplotypes and the recombination parameter were in-ferred using the PHASE software (Stephens and Scheet 2005)and diversity indexes calculations (tables 2 and 3) as well asneutrality statistics (table 4) were estimated using DnaSP soft-ware We applied two kinds of neutrality tests 1) tests basedon the allelic spectrum which is the distribution of polymor-phisms across different classes of frequencies namelyTajimarsquos DT (Tajima 1989) and FundashLirsquos DFL and FFL (Fu andLi 1993) and 2) tests based on comparisons between thenumber of polymorphisms in human populations and fixeddifferences with the chimpanzee (ie outgroup) namely theMcDonald and Kreitman (1991) test and the adaptedKolmogorovndashSmirnoff test by McDonald (1998) For thefirst set of tests we used as null hypotheses both the classicWrightndashFisher model of neutrality with a constant popula-tion size as well as the more realistic evolutionary scenario forhuman populations inferred by Laval et al (2010) In the caseof the scenario of Laval et al (2010) we ignored interconti-nental gene flow within the Old World because these raregene flow events likely does not affect the level of significanceof the neutrality tests given its very low inferred values(13 105) Null distributions used to test the significanceof the neutrality tests under these evolutionary scenarios weregenerated using coalescent simulations and a significancelevel of 005 (Hudson 2002) The KolmogorovndashSmirnoff testof neutrality adapted by McDonald (1998) was performedusing Slider software available at httpudeledu~mcdonaldaboutdnasliderhtml (last accessed July 16 2013) Weperformed coalescent simulations using ms software(Hudson 2002) Further methodological details are availableas supplementary material Supplementary Material onlineWe constructed the CYBA and NCF2 networks using allSNP variants and applying the Median joining algorithmand the maximum parsimony option calculations as imple-mented in the software Network 46 (Bandelt et al 1999)

Supplementary MaterialSupplementary tables S1ndashS5 and figure S1 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

SJC conceived the study ET-S MM FL RC AC CF LPand BS generated the dataanalyzed the sequences LB ACWCSM and MY provided tools for data generationand analyses ET-S MM FL WCSM and RR performedpopulation genetics analyses ET-S and MM prepared tablesand figures ET-S and SJC wrote the manuscript All the

authors contributed with discussions about the results Theauthors are grateful to Silvia Fuselli for discussions of ourresults to Fernando Levi Soares for technical assistance withthe figures and the Sequencing Group of the CoreGenotyping Facility (National Cancer Institute) for their tech-nical assistance This work was supported by the IntramuralResearch Program of the National Cancer Institute (NCI) andNational Institutes of Health (NIH) CF was supported by theUniversity of Bologna ET-S MM WCSM and FL weresupported by the Fogarty International Center and NationalCancer Institute (1R01TW007894) Conselho Nacional deDesenvolvimento Cientıfico e Tecnologico (Brazil Grant3075562012-3) Fundacao de Amparo a Pesquisa de MinasGerais (Brazil CBB PPM 0026512) and the Brazilian Ministryof Education (CAPES Agency grant PNPD 0256709-1)

ReferencesAyodo G Price AL Keinan A Ajwang A Otieno MF Orago AS

Patterson N Reich D 2007 Combining evidence of natural selectionwith association analysis increases power to detect malaria-resis-tance variants Am J Hum Genet 81(2)234ndash242

Bamshad M Wooding SP 2003 Signatures of natural selection in thehuman genome Nat Rev Genet 4(2)99ndash111

Bandelt H-J Forster P Rohl A 1999 Median-joining networks for infer-ring intraspecific phylogenies Mol Biol Evol 16(1)37ndash48

Barreiro LB Ben-Ali M Quach H et al (17 co-authors) 2009Evolutionary dynamics of human Toll-like receptors and their dif-ferent contributions to host defense PLoS Genet 5(7)e1000562

Barreiro LB Quintana-Murci L 2010 From evolutionary genetics tohuman immunology how selection shapes host defence genesNat Rev Genet 11(1)17ndash30

Barrett RD Schluter D 2008 Adaptation from standing genetic varia-tion Trends Ecol Evol 23(1)38ndash44

Bedard K Attar H Bonnefont J Jaquet V Borel C Plastre O Stasia MJAntonarakis SE Krause KH 2009 Three common polymorphisms inthe CYBA gene form a haplotype associated with decreased ROSgeneration Hum Mutat 30(7)1123ndash1133

Blekhman R Man O Herrmann L Boyko AR Indap A Kosiol CBustamante CD Teshima KM Przeworski M 2008 Natural selectionon genes that underlie human disease susceptibility Curr Biol18(12)883ndash889

Brandes RP Kreuzer J 2005 Vascular NADPH oxidases molecular mech-anisms of activation Cardiovasc Res 65(1)16ndash27

Buckley RH 2004 Pulmonary complications of primary immunodefi-ciencies Paediatr Respir Rev 5(Suppl A)S225ndashS233

Bustamante CD Fledel-Alon A Williamson S et al (13 co-authors)2005 Natural selection on protein-coding genes in the humangenome Nature 437(7062)1153ndash1157

Bustamante J Arias AA Vogt G et al (29 co-authors) 2011 GermlineCYBB mutations that selectively affect macrophages in kindredswith X-linked predisposition to tuberculous mycobacterial diseaseNat Immunol 12(3)213ndash221

Campbell MC Tishkoff SA 2008 African genetic diversity implicationsfor human demographic history modern human origins and com-plex disease mapping Annu Rev Genomics Hum Genet 9403ndash433

Chanock SJ Roesler J Zhan S Hopkins P Lee P Barrett DT ChristensenBL Curnutte JT Gorlach A 2000 Genomic structure of the humanp47-phox (NCF1) gene Blood Cells Mol Dis 26(1)37ndash46

Cunninghame Graham DS Morris DL Bhangale TR Criswell LASyvanen AC Ronnblom L Behrens TW Graham RR Vyse TJ2011 Association of NCF2 IKZF1 IRF8 IFIH1 and TYK2 with sys-temic lupus erythematosus PLoS Genet 7(10)e1002341

Excoffier L Laval G Schneider S 2007 Arlequin ver 30 an integratedsoftware package for population genetics data analysis EvolBioinform Online 147ndash50

2166

Tarazona-Santos et al doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

Excoffier L Ray N 2008 Surfing during population expansions promotesgenetic revolutions and structuration Trends Ecol Evol23(7)347ndash351

Ferrer-Admetlla A Bosch E Sikora M Marques-Bonet T Ramırez-SorianoA Muntasell A Navarro A Lazarus R Calafell F Bertranpetit J Casals F2008 Balancing selection is the main force shaping the evolution ofinnate immunity genes J Immunol 181(2)1315ndash1322

Fu YX Li WH 1993 Statistical tests of neutrality of mutations Genetics133(3)693ndash709

The 1000 Genomes Project Consortium 2010 A map of humangenome variation from population-scale sequencing Nature467(7319)1061ndash1073

The 1000 Genomes Project Consortium Abecasis GR Auton A BrooksLD DePristo MA Durbin RM Handsaker RE Kang HM Marth GTMcVean GA 2012 An integrated map of genetic variation from1092 human genomes Nature 491(7422)56ndash65

Guo SW Thompson EA 1992 Performing the exact test of Hardy-Weinberg proportion for multiple alleles Biometrics 48(2)361ndash372

Heyworth PG Cross AR Curnutte JT 2003 Chronic granulomatousdisease Curr Opin Immunol 15(5)578ndash584

Hudson RR 2002 Generating samples under a Wright-Fisher neutralmodel of genetic variation Bioinformatics 18(2)337ndash338

Jacob CO Eisenstein M Dinauer MC et al (21 co-authors) 2012 Lupus-associated causal mutation in neutrophil cytosolic factor 2 (NCF2)brings unique insights to the structure and function of NADPHoxidase Proc Natl Acad Sci U S A 109(2)E59ndashE67

Kimura M 1974 The neutral theory of molecular evolution CambridgeCambridge University Press

Kosiol C Vinar T da Fonseca RR Hubisz MJ Bustamante CD Nielsen RSiepel A 2008 Patterns of positive selection in six mammalian ge-nomes PLoS Genet 4(8)e1000144

Lacy F Kailasam MT OrsquoConnor DT Schmid-Schonbein GW Parmer RJ2000 Plasma hydrogen peroxide production in human essentialhypertension role of heredity gender and ethnicity Hypertension36(5)878ndash884

Laval G Patin E Barreiro LB Quintana-Murci L 2010 Formulating a his-torical and demographic model of recent human evolution based onresequencing data from noncoding regions PLoS One 5(4)e10284

Lewontin R 1972 The apportionment of human diversity Evol Biol 6391ndash398

Lindblad-Toh K Garber M Zuk O et al (88 co-authors) 2011 A high-resolution map of human evolutionary constraint using 29 mam-mals Nature 478(7370)476ndash482

Lodge R Diallo TO Descoteaux A 2006 Leishmania donovani lipopho-sphoglycan blocks NADPH oxidase assembly at the phagosomemembrane Cell Microbiol 8(12)1922ndash1931

Magalhaes WC Rodrigues MR Silva D Soares-Souza G Iannini MLCerqueira GC Faria-Campos AC Tarazona-Santos E 2012DIVERGENOME a bioinformatics platform to assist population ge-netics and genetic epidemiology studies Genet Epidemiol36(4)360ndash367

Marth JD Grewal PK 2008 Mammalian glycosylation in immunity NatRev Immunol 8(11)874ndash887

McDonald JH 1998 Improved tests for heterogeneity across a region ofDNA sequence in the ratio of polymorphism to divergence Mol BiolEvol 15377ndash384

McDonald JH Kreitman M 1991 Adaptive protein evolution at the Adhlocus in Drosophila Nature 351(6328)652ndash654

Nielsen R Bustamante C Clark AG et al (12 co-authors) 2005 A scanfor positively selected genes in the genomes of humans and chim-panzees PLoS Biol 3(6)e170

Olsson LM Lindqvist AK Kallberg H Padyukov L Burkhardt HAlfredsson L Klareskog L Holmdahl R 2007 A case-control studyof rheumatoid arthritis identifies an associated single nucleotidepolymorphism in the NCF4 gene supporting a role for the

NADPH-oxidase complex in autoimmunity Arthritis Res Ther9(5)R98

Packer BR Yeager M Burdett L et al (13 co-authors) 2006SNP500Cancer a public resource for sequence validation assay de-velopment and frequency analysis for genetic variation in candidategenes Nucleic Acids Res 34(Database issue)D617ndashD621

Peter BM Huerta-Sanchez E Nielsen R 2012 Distinguishing betweenselective sweeps from standing variation and from a de novo mu-tation PLoS Genet 8(10)e1003011

Przeworski M Coop G Wall JD 2005 The signature of positive selectionon standing genetic variation Evolution 59(11)2312ndash2323

Ptak SE Przeworski M 2002 Evidence for population growth in humansis confounded by fine-scale population structure Trends Genet18(11)559ndash563

Ramensky V Bork P Sunyaev S 2002 Human non-synonymous SNPsserver and survey Nucleic Acids Res 30(17)3894ndash3900

Rioux JD Xavier RJ Taylor KD et al (24 co-authors) 2007 Genome-wideassociation study identifies new susceptibility loci for Crohn diseaseand implicates autophagy in disease pathogenesis Nat Genet39(5)596ndash604

Roberts RL Hollis-Moffatt JE Gearry RB Kennedy MA Barclay MLMerriman TR 2008 Confirmation of association of IRGM andNCF4 with ileal Crohnrsquos disease in a population-based cohortGenes Immun 9(6)561ndash565

Rozas J 2009 DNA sequence polymorphism analysis using DnaSPMethods Mol Biol 537337ndash350

San Jose G Fortuno A Beloqui O Dıez J Zalba G 2008 NADPH oxidaseCYBA polymorphisms oxidative stress and cardiovascular diseasesClin Sci (Lond) 114(3)173ndash182

Santiago HC Gonzalez Lombana CZ Macedo JP et al (12 co-authors)2012 NADPH phagocyte oxidase knockout mice controlTrypanosoma cruzi proliferation but develop circulatory collapseand succumb to infection PLoS Negl Trop Dis 6(2)e1492

Stephens M Scheet P 2005 Accounting for decay of linkage disequilib-rium in haplotype inference and missing-data imputation Am JHum Genet 76(3)449ndash462

Sumimoto H Miyano K Takeya R 2005 Molecular composition andregulation of the Nox family NAD(P)H oxidases Biochem BiophysRes Commun 338(1)677ndash686

Tajima F 1983 Evolutionary relationship of DNA sequences in finitepopulations Genetics 105(2)437ndash460

Tajima F 1989 Statistical method for testing the neutral mutationhypothesis by DNA polymorphism Genetics 123(3)585ndash595

Tarazona-Santos E Bernig T Burdett L Magalhaes WC Fabbri C Liao JRedondo RA Welch R Yeager M Chanock SJ 2008 CYBB anNADPH-oxidase gene restricted diversity in humans and evidencefor differential long-term purifying selection on transmembrane andcytosolic domains Hum Mutat 29(5)623ndash632

Taylor RM Burritt JB Baniulis D Foubert TR Lord CI Dinauer MCParkos CA Jesaitis AJ 2004 Site-specific inhibitors of NADPH oxi-dase activity and structural probes of flavocytochrome b character-ization of six monoclonal antibodies to the p22phox subunitJ Immunol 173(12)7349ndash7357

Voight BF Kudaravalli S Wen X Pritchard JK 2006 A map of recentpositive selection in the human genome PLoS Biol 4(3)e72

Watterson GA 1975 On the number of segregating sites in geneticalmodels without recombination Theor Popul Biol 7(2) 256ndash276

Williamson SH Hernandez R Fledel-Alon A Zhu L Nielsen RBustamante CD 2005 Simultaneous inference of selection and pop-ulation growth from patterns of variation in the human genomeProc Natl Acad Sci U S A 102(22)7882ndash7887

Yang Z 2007a Adaptive molecular evolution In Balding DJ Bishop MCannings C editors Handbook of statistical genetics Vol 1 3rd edSusex (United Kingdom) John Wiley amp Sons p 377ndash406

Yang Z 2007b PAML 4 phylogenetic analysis by maximum likelihoodMol Biol Evol 241586ndash1591

2167

Evolutionary Dynamics of Human NADPH Oxidase Genes doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

mammalian evolution as inferred in this study For the auto-somal NADPH oxidase components (table 1 and haplotypetables online Supplementary Material online) we observe inthis study two rare and conservative nonsynonymous substi-tutions (ie involving amino acids with similar chemical prop-erties T85N and A304E) in NCF4 But for NCF2 and CYBA weobserved patterns of nonsynonymous substitutions thatseldom occur in human genes NCF2 has nine nonsynony-mous substitutions three of them are common (with a fre-quency higher than 5 in at least one of the studiedpopulation samples) and six are rare On the other handtwo nonsynonymous substitutions in CYBA are commonand ubiquitous in human populations namely Y72H(rs4673) and V174A (rs1049254 in a position where variationamong mammalian species is also observed) Moreover thefollowing two of the five common amino acid changes ob-served in the autosomal NADPH genes are predicted to bepossibly damaging (ie radical) by the Polyphen resource(Ramensky et al 2002) the two changes are R395W inHispanic NCF2 (rs13306575) and Y72H (rs4673) in CYBA Ingeneral Polyphen accurately predicts the effect of nonsynon-ymous substitutions based on biochemical and evolutionarydata (Williamson et al 2005) Notably the ImmunodeficiencyMutations Database (httpbioinfutafiNCF2basecontent=pubIDbases last accessed July 16 2013) reports one395W395W autosomal recessive CGD patient but we andthe HapMap project (wwwhapmaporg last accessed July 16

2013) observed the W allele at frequencies between 5 and10 in Asians and admixed Latin Americans including onesupposedly healthy 395W395W Japanese HapMap individ-ual On the basis of these results we verified whether NativeAmericans who descend from an ancestral Pleistocene Asianpopulations that peopled the Americas by the Behring Straitsmore than 14000 years ago may have relatively higher fre-quencies of this variant We genotyped the variant 395Wusing a Taqman assay in 558 Native Americans (see supple-mentary table S3 Supplementary Material online for detailedresults) and observed an allele frequency of 12 being thisvariant always present in heterozygous individuals

For the four studied genes the diversity and levels of re-combination are higher in Africans than in non-Africans(table 2 and haplotype tables available as supplementaryfiles Supplementary Material online) This result is a conse-quence of the African origin of modern humans and the ldquooutof Africardquo migration that occurred 40000ndash80000 years agoafter a bottleneck leading to the peopling of other continents(Campbell and Tishkoff 2008 Laval et al 2010) Therefore thefirst divergence between continental human populations wasbetween Africans and ancestral non-Africans Consistentlywith this scenario we observed the highest between-popula-tion differentiation for CYBB CYBA and NCF4 between thesetwo groups Interestingly NCF2 does not match this pattern(table 3)

FIG 2 Inferred types of natural selection for codons of the NADPH genes at the evolutionary time scale of mammals Codons are represented along thehorizontal axis For each gene three classes of sites (black dark gray and light gray) are considered and each class evolved under different inferred values (presented for each gene in the figure at the top of each graphic) These classes correspond to the model M3 of Yang (2007a) with three classes ofsites Given our data this model is more likely than alternative models of evolution that assume simpler scenarios such as a unique for the entire gene(see supplementary material Supplementary Material online for details regarding methods and results using alternative models) The three classescorrespond to different types and levels of natural selection from strong purifying selection (in the lightest gray) to positive selection (gt 1) For eachcodon the probability of belonging to each of the three classes of corresponds to the height of the corresponding color in the vertical bar Forexample codon 173 of CYBB (indicated by a white arrow) has a 0000 probability of belonging to the= 00095 class (light gray a class corresponding tostrong purifying selection) a 0169 probability of belonging to the = 03085 class (dark gray) and a 0831 probability of belonging to the = 18987class (black a class that suggests positive selection) In this case reasonable evidence of positive selection on this codon exists

2160

Tarazona-Santos et al doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

FIG

3

Nat

ural

sele

ctio

nm

app

ing

acro

ssC

YBB

(en

codi

ng

gp91

)al

ong

mam

mal

ian

evol

utio

na

sid

enti

fied

usin

gth

ePA

ML

met

hod

byY

ang

(200

7a)

The

top

olog

ies

ofgp

91an

dp2

2ar

ere

pro

duce

dfr

omT

aylo

ret

al(

2004

Cop

yrig

ht20

04T

heA

mer

ican

Ass

ocia

tion

ofIm

mun

olog

ists

In

cU

sed

wit

hp

erm

issi

on)

Dar

kgr

ayam

ino

acid

sha

veev

olve

dun

der

pos

itiv

ese

lect

ion

wit

hgt

80

pro

babi

lity

Mos

tof

thes

eam

ino

acid

sar

ein

the

extr

acel

lula

rp

orti

onof

the

pro

tein

The

upp

erp

art

ofth

efig

ure

show

sth

ep

rote

inal

ign

men

tfo

rn

ine

mam

mal

sof

the

gp91

regi

onin

dica

ted

byth

ebl

ack

ellip

seI

nth

isre

gion

ahi

ghle

velo

fam

ino

acid

varia

tion

isfo

und

betw

een

spec

ies

and

seve

ralc

odon

ssh

owgt

1In

this

alig

nm

ent

gray

vert

ical

bars

corr

esp

ond

tova

riab

leam

ino

acid

site

sT

hep

rote

inal

ign

men

tof

mam

mal

ssh

ows

the

follo

win

gsp

ecie

sH

om(H

omo

sapi

ens

sapi

ens)

Pan

(Pan

trog

lody

tes)

Mac

(Mac

aca

mul

atta

)M

us(M

usm

uscu

lus)

Rat

(Rat

tus

norv

egic

us)

Cri

(Cri

cetu

lus

gris

eus)

Het

(Het

eroc

epha

lus

glab

er)

Cav

(Cav

iapo

rcel

lus)

an

dO

ry(O

ryct

olag

uscu

nicu

lus)

EC

ext

race

lula

ren

viro

nm

ent

TM

tra

nsm

embr

ane

laye

ran

dIC

in

trac

ellu

lar

envi

ron

men

t

2161

Evolutionary Dynamics of Human NADPH Oxidase Genes doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

From the four studied genes NCF4 which encodes theregulatory protein p40-phox shows a pattern of diversitythat is typical for a gene that has evolved under neutralityIn addition to the features described in the previous para-graph the allelic spectra of NCF4 in the studied populationsare consistent with a neutral model of evolution (tables 2ndash4)

Although CYBB presented the most interesting evolution-ary history at the interspecific level with repeated episodes ofpositive natural selection recent human evolutionary historyhas resulted in interesting patterns of variation for CYBA andNCF2 CYBA encodes p22-phox which is a transmembraneprotein shared by different NADPH oxidases In addition toharboring two common nonsynonymous polymorphisms(V174A is also variable among different species of mammals)CYBA is the most variable and most affected by recombina-tion among the NADPH oxidase genes (tables 1 and 2) Inparticular CYBA diversity is very high in Europe comparedwith 329 genes resequenced in a European sample (httppgagswashingtonedusummary_statshtml last accessed July16 2013) CYBA ranks 11th (ie the 97th percentile)Moreover there are contrasting proportions of the totalnumber of polymorphismsnumber of singletons betweenAfricans (a low proportion) and Europeans (a high propor-tion) the latter showing an excess of common polymor-phisms with respect to the neutral expectation (see the DFL

test in table 4 and supplementary table S4 SupplementaryMaterial online) This excess of common variants inEuropeans is also significant when we conservatively testedit against a scenario of human evolution that incorporates theldquoOut of Africardquo bottleneck (Laval et al 2010) and the observedlevel of recombination in Europeans (CYBA = 807 for the se-quenced region) Because demographic forces and recombi-nation levels alone do not explain the high CYBA diversity andits excess of common polymorphisms we suggest that bal-ancing natural selection (that acts by maintaining differentalleles at high frequency in a population) has contributed to

shape the diversity of CYBA at least in the European popu-lation Indeed figure 4a shows that the haplotype network ofCYBA for the European population is consistent with theaction of balancing natural selection (Bamshad andWooding 2003) showing two well-differentiated commonclades that explain the observed high diversity and theexcess of common CYBA variants A comparative genomicanalysis confirms this inference the ratio of polymorphismsto differences fixed between human and chimpanzee is nothomogeneous along the gene in the different human popu-lations (Mc Donald 1998 supplementary table S5Supplementary Material online) as would be expectedunder neutral evolution Our inference of balancing naturalselection is consistent with the fact that 25ndash30 of the var-iation in levels of ROS production can be attributed to geneticfactors (Lacy et al 2000) and that ROS levels are associatedwith CYBA variants (Bedard et al 2009)

p67 encoded by NCF2 is a necessary cytosolic NADPHcomponent for phagocyte ROS production Asians show ahighly differentiated NCF2 haplotype structure (see frequen-cies of haplotypes NCF2-D11 and NCF2-E10 in the haplotypetables online and in the network shown in fig 4b) and thehighest FST values are observed in pairwise comparisons be-tween Asians and non-Asian populations (in particular withEuropeans table 3) and not between Africans and non-African populations as is usually observed in the humangenome We confirmed these results by analyzing data forNCF2 from the HapMap Project (supplementary materialSupplementary Material online) Moreover a trend towardan excess of rare polymorphisms exists in Asians that is notobserved elsewhere (tables 1 2 and 4 DFL =1904FFL =1893) Although we cannot exclude that this patternof diversity is compatible with the null hypotheses of neutral-ity and with the tested demographic history of human pop-ulations inferred by Laval et al (2010 tables 2ndash4) we canspeculate and envisage four additional evolutionary scenarios

Table 1 Allele Frequencies of Nonsynonymous Polymorphisms in NADPH Oxidase Genes

Genes rs Minor Allele(Amino Acid)

PolyphenPrediction

African European Asian Hispanic

CYBA

Y72H rs4673 T (Y) Possibly damaging 046 032 017 022

V174A rs1049254 T (V) Benign 017 048 048 018

NCF2

K181R rs2274064 G (R) Benign 035 037 041 048

T279M rs13306581 T (T) Probably damaging 000 000 005 000

V297A rs35937854 C (A) Benign 004 000 000 000

T361S Chr1181799289 NCBI36hg18 T (S) mdash 000 000 002 000

H389Q rs17849502 A (Q) Benign 000 005 000 007

R395W rs13306575 T (W) Possibly damaging 000 000 000 007

N419I rs35012521 T (I) Probably damaging 000 002 004 000

P454S rs55761650 T (S) mdash 000 000 000 002

L487S Chr1181795862 NCBI36hg18 C (S) mdash 002 000 000 000

NCF4

T85N rs112306225 A (N) mdash 000 000 002 000

A304E rs5995361 A (E) Benign 004 000 000 000

2162

Tarazona-Santos et al doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

that may have contributed to shape the pattern of NCF2diversity in Asians 1) the observed trend is suggestive of aselective sweep on NCF2 standing variation a neutral orweakly deleterious existing variant becomes beneficial andrapidly increases in frequency (together with its associatedhaplotypes ie incomplete sweep) reducing the nucleotidediversity in the surrounding region and rendering other stand-ing substitutions rare During this process new rare substitu-tions appear in the expanding positively selected haplotype2) The pattern of diversity of NCF2 in Asia may result from anincomplete selective sweep acting on ARPC5 which is locatedapproximately 35 kb downstream of NCF2 In a genome-widescan for recent positive selection Voight et al (2006) identi-fied a strong signature of an incomplete sweep for ARPC5 inAsia (P = 0009) characterized by a higher than expected long-range linkage disequilibrium summarized by very high iHS

statistics (see Haplotter results for the HapMap II data athttphaplotteruchicagoedu last accessed July 16 2013)SNPs in NCF2 also presents high iHS statistics (P = 002) al-though values are lower than for ARPC5 3) The differentiatedpattern of diversity of NCF2 in Asia may also have been gen-erated without the action of natural selection during the firstcolonization of Asia by modern humans In a process of geo-graphic population expansion specifically in the front wave ofthe expansion some rare alleleshaplotypes (ie surfing al-leles) may become common by chance mimicking the pat-tern of diversity generated by a selective sweep (Excoffier andRay 2008) 4) The excess of rare variants may be an artifact ofpooling individuals from different populations (Ptak andPrzeworski 2002) Consistent with evolutionary scenarios 1ndash4 that produce similar patterns of diversity the haplotypenetwork of NCF2 for Eurasians (fig 4b) shows the following1) a large differentiation between Asians and Europeans thatis compatible with the high observed FST values and 2) a star-like shape associated with the haplotype NCF2-E10 that iscommon in Asia and rare elsewhere which is compatiblewith the excess of rare alleles in the Asian populations

DiscussionBy analyzing 29 mammalian genomes and four human pop-ulations we show in this study that natural selection hasacted in different ways over time to shape the pattern ofdiversity of the phagocyte NADPH oxidase genes At the tem-poral scale of the evolution of mammals we have inferredrecurrent episodes of positive selection acting on the extra-cellular portion of gp-91 that have been important to shapethe pattern of interspecific diversity of this gene Our in-terspecific analyses did not show a similar pattern of naturalselection in any of the other phagocyte NADPH oxidasegenes Even if current knowledge on the biology of NADPHdoes not allow us to interpret our results in terms of functionwe propose that the extracellular region of gp-91 is function-ally relevant Our results also imply that this region is highlydifferentiated among mammals at the protein level and thisvariability should be considered when mammals models areused to study the structure and function of phagocyteNADPH components

In the time scale of human evolution our analyses of theNADPH oxidase genes suggest that CYBA has been a target ofbalancing natural selection Because we do not have evidenceof population-specific variants that faced selective pressurethe inferred natural selection may have acted on a standingvariation in ancestral populations This implies that the selec-tive pressure began after the appearance of the variant andpossibly acted in a specific geographic region (Barret andSchluter 2008) The signatures of natural selection acting ona new mutation and on standing variation differ In the case ofselective sweeps episodes of natural selection on standingvariation are associated to a larger variance in the allelic spec-trum with respect to natural selection on a new mutationAlso selection on standing variation may produce an excess ofalleles at intermediate frequencies that is not associated withhigh nucleotide diversity (Przeworski et al 2005 Peter et al2012) This pattern contrasts with the effect of balancing

Table 2 Intrapopulation Diversity Indexes in the Studied Populationsfor the NADPH Oxidase Genes Obtained from Resequencing Dataa

African European Asian Hispanic

Number of chromosomes

CYBBb 42 52 34 36

CYBA 48 62 48 46

NCF2 48 62 48 46

NCF4 48 62 48 46

Segregating sitessingletons

CYBB 218 70 103 135

CYBA 6122 333 335 347

NCF2 4613 3311 2816 3714

NCF4 4512 267 191 308

Haplotype structureNumber of inferred haplotypesc

CYBB 14 5 7 12

CYBA 39 39 32 26

NCF2 38 32 18 31

NCF4 36 30 17 22

Haplotype diversity plusmn SD

CYBB 088 plusmn 003 034 plusmn 008 053 plusmn 010 070 plusmn 008

CYBA 098 plusmn 001 098 plusmn 001 097 plusmn 000 096 plusmn 002

NCF2 099 plusmn 001 096 plusmn 001 087 plusmn 004 097 plusmn 001

NCF4 098 plusmn 001 093 plusmn 002 093 plusmn 002 093 plusmn 002

Recombination parameter (q 103 per site)c

CYBB 008 lt001 lt001 lt002

CYBA 291 148 096 088

NCF2 052 038 010 058

NCF4 137 103 039 022

h estimatorsd

p 103 per site

CYBB 036 012 015 028

CYBA 190 163 151 157

NCF2 081 047 043 057

NCF4 101 076 064 084

hW 103 per site

CYBB 042 013 021 027

CYBA 231 118 125 13

NCF2 102 069 062 083

NCF4 101 070 054 087

aMost analyses were performed using software DnaSP (Rozas 2009)bData for CYBB (Xp211) are from Tarazona-Santos et al (2008)cHaplotypes and inferred using the method by Stephens and Scheet (2005) andthe software PHASEd Tajima (1983) W Watterson (1975)

2163

Evolutionary Dynamics of Human NADPH Oxidase Genes doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

natural selection which produces an excess of common allelesassociated with high genetic diversity Thus the observed pat-tern of CYBA diversity in Europeans is not consistent with aselective sweep on a standing variation but it is consistent witha scenario of balancing selection acting on standing variation

If we consider for CYBA that heterozygote advantage maybe the mechanisms of balancing selection we can speculatethat the biological basis for this mechanism may be the fol-lowing considering that p22-phox is not exclusive of thephagocyte NADPH oxidase but it is also part of Nox com-plexes expressed in other tissues the dependence of ROSproduction on CYBA variants has to be finely regulated IfCYBA variants induce high levels of ROS these variants mayfavor a phagocyte-dependent efficient response to pathogensbut may damage other endothelial tissues Alternativelytissue oxidative damage does not occur if ROS productionis low but this response may be associated with a weaker

phagocyte respiratory burst against pathogens In this con-text heterozygote individuals with a CYBA-dependent inter-mediate level of ROS production may have been favored bynatural selection

Our results contribute to the discussion regarding the rel-evance of balancing selection in shaping the diversity ofinnate immunity genes (Ferrer-Admetlla et al 2008 Barreiroet al 2009) Ferrer-Admetlla et al (2008) have associated therecurrent signatures of balancing selection on inflammatorygenes with the need for fine regulation CYBA is the onlyphagocytic NADPH oxidase gene that also encodesnonphagocyte Nox components thus CYBA has a role inROS cell signaling a potentially dangerous process due toits capability to produce oxidative damage to tissues Geneswith these characteristics likely need even tighter regulationThe interplay between pathogen-driven selective pressureon innate immunity genes and their concomitantnonimmunological functions is complex In addition toCYBA other interesting examples of this interplay can befound among the 10 human toll-like receptors (TLRs) thatshow a variety of signatures of natural selection TLR8 whichshows the strongest signature of purifying selection is alsoinvolved in neuronal development (Barreiro et al 2009) and itis difficult to discriminate the role of each function in deter-mining the observed signature of natural selection

With few exceptions the pathogens responsible for naturalselection on immune genes are difficult to specify In the caseof NADPH oxidase we can infer based on the spectrum ofinfections in CGD patients that catalase-positive bacteria andfungi such as Staphylococci Salmonella Candida Aspergillusand M tuberculosis may be the selective pathogensInteractions between the host and pathogens also includemechanisms of the latter to impair the respiratory burst ofthe former For example Leishmania donovani blocks the as-sembly of NADPH oxidase at the phagosome membrane(Lodge et al 2006) These mechanisms may constitute selec-tive pressures imposed by pathogens

The associations reported in GWAS between rs4821544 inNCF4 and Crohnrsquos disease (an idiopathic inflammatory boweldisease that predominantly involves the ileum and colonRioux et al 2007) and between rs10911363 in NCF2 and

Table 3 Pairwise FST Genetic Distances between Populations

CYBBa CYBA

Africa Europe Asia Hispanic Africa Europe Asia Hispanic

Africa mdash 0316 0264 0092 mdash 0074 0083 0065

Europe 0257 mdash 0000 0107 mdash 0002 0056

Asia 0211 0000 mdash 0073 mdash 0054

Hispanic 0070 0082 0056 mdash mdash

NCF2 NCF4

Africa mdash 0048 0058 0037 mdash 0128 0136 0158

Europe mdash 0069 0000 mdash 0026 0026

Asia mdash 0059 mdash 0005

Hispanic mdash mdash

aFor CYBB FST estimators are above the diagonal Below the diagonal are the FST values corrected as if the effective population sizes of X chromosome genes were equal toautosomal ones

Table 4 Results of Neutrality Tests for the NADPH Oxidase Genesand Their Significancea

African European Asian Hispanic

Tajimarsquos D

CYBB 0473 0274 0813 0084

CYBA 0580 1242 0684 0707

NCF2 0412 0939 0883 0987

NCF4 0750 0243 0556 0102

Fu and Lirsquos D

CYBB 1050 1110 0395 0977

CYBA 1485 1734 1188 0138

NCF2 0407 1105 1904 1398

NCF4 0308 0443 1893 0266

Fu and Lirsquos F

CYBB 0980 0811 0114 0814

CYBA 1382 1893 1205 0405

NCF2 0512 1252 1893 1567

NCF4 0557 0231 1225 0248

NOTEmdashUnderlined values represent significant results under the demographic modelinferred by Laval et al (2010) for human populations See details in supplementarytable S4 Supplementary Material onlineaThe McDonaldndashKreitman test is nonsignificant in any of the cases

Significant under the WrightndashFisher model of constant population size

2164

Tarazona-Santos et al doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

systemic lupus erythematosus (Cunninghame Graham et al2011) confirm the involvement of NADPH genes in the path-ogenesis of inflammatory-related common diseases Ourclaim that natural selection acted on CYBA (and maybe inNCF2) is relevant for biomedical studies because combiningevidence of natural selection with association analyses inimmune genes increases the statistical power to detect dis-ease-associated variants (Ayodo et al 2007) As a new gener-ation of association studies focusing on rare variation isemerging the combination of genes deemed interestingfrom GWAS and populations with an excess of rare variantsin these genes such as NCF2 in Asians are particularly inter-esting as a source of rare variants with clinical relevanceFinally by determining through molecular evolution mappingthat the extracellular portion of gp91 (encoded by CYBB inthe X-chromosome) has been subject to recurrent episodes ofpositive selection at the scale of mammals evolution we positthe hypothesis that this portion of the NADPH oxidase isrelevant for currently unknown biological processes thatonce revealed by structural and functional investigationswill contribute to understanding the role of NADPH oxidasein infectious autoimmune and cardiovascular diseases

Materials and Methods

Molecular Evolution Analysis of NADPH Genes

We used the maximum likelihood framework developed byYang (2007a) to estimate for the NADPH oxidase genes

under a variety of evolutionary models as implemented in thePAML software (Yang 2007b) This approach allows infer-ences about the evolution of a coding region along aninter-specific phylogeny mapping the codons that haveevolved under strongweak purifying selection neutrality oradaptive positive selection Further details about these anal-yses are available as supplementary material SupplementaryMaterial online

Human Population Genetics of NADPH Genes

For human population genetics analyses we conducted bidi-rectional Sanger sequencing of CYBB CYBA NCF2 and NCF4for a total of 35242 bp for each of 102 healthy individuals aspart of the SNP500 Cancer project (Packer et al 2006 seesupplementary fig S1 Supplementary Material online for de-tails) Human population resequencing data for CYBB werepublished in Tarazona-Santos et al (2008) The resequencingexperiments were performed as in Packer et al (2006) These102 unrelated individuals include the following 24 of Africanancestry (15 African Americans from the United States and 9Pygmies) 23 admixed Latin Americans (from Mexico PuertoRico and South America) 31 Europeans (from the CEPHUTAH pedigree and the NIEHS Environmental GenomeProject) and 24 AsiansndashOceanians (from Melanesia PakistanChina Cambodia Japan and Taiwan)

After controlling for multiple tests we confirmed that allSNPs fit the HardyndashWeinberg equilibrium by the Guo and

CYBA - A16

CYBA - A3

Y72H - rs4673

V174A - rs1049254 CYBA - E18

(a)

NCF2 - E10 H389Q - rs17849502

NCF2 - B3

NCF2 - D11

NCF2 - C1K181R - rs2274064(b)

FIG 4 Phylogenetic networks of (a) CYBA in Europeans and (b) NCF2 in Europeans (black) and Asians (yellow) The lengths of the branches areproportional to the number of mutations Only nonsynonymous mutations are shown The haplotype names correspond to the table of inferredhaplotypes in the supplementary material Supplementary Material online We only show the names of haplotypes with a frequency gt5 Ancestralhaplotypes (inferred as being the human haplotype or median vector most similar to the chimpanzee sequence) are indicated by an arrow Medianvectors are in red

2165

Evolutionary Dynamics of Human NADPH Oxidase Genes doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

Thompson (1992) test which was implemented in the soft-ware Arlequin 30 (Excoffier et al 2007) Insertion-deletions(INDELs) were excluded from population genetics analysesTo assess intrapopulation diversity we used two statistics the per-site mean number of pairwise differences betweensequences (Tajima 1983) and w based on the number ofsegregating sites (S) (Watterson 1975) We measured pairwisebetween-populations diversity by using the FST statistics cal-culated using the software DnaSP (Rozas 2009)

Haplotypes and the recombination parameter were in-ferred using the PHASE software (Stephens and Scheet 2005)and diversity indexes calculations (tables 2 and 3) as well asneutrality statistics (table 4) were estimated using DnaSP soft-ware We applied two kinds of neutrality tests 1) tests basedon the allelic spectrum which is the distribution of polymor-phisms across different classes of frequencies namelyTajimarsquos DT (Tajima 1989) and FundashLirsquos DFL and FFL (Fu andLi 1993) and 2) tests based on comparisons between thenumber of polymorphisms in human populations and fixeddifferences with the chimpanzee (ie outgroup) namely theMcDonald and Kreitman (1991) test and the adaptedKolmogorovndashSmirnoff test by McDonald (1998) For thefirst set of tests we used as null hypotheses both the classicWrightndashFisher model of neutrality with a constant popula-tion size as well as the more realistic evolutionary scenario forhuman populations inferred by Laval et al (2010) In the caseof the scenario of Laval et al (2010) we ignored interconti-nental gene flow within the Old World because these raregene flow events likely does not affect the level of significanceof the neutrality tests given its very low inferred values(13 105) Null distributions used to test the significanceof the neutrality tests under these evolutionary scenarios weregenerated using coalescent simulations and a significancelevel of 005 (Hudson 2002) The KolmogorovndashSmirnoff testof neutrality adapted by McDonald (1998) was performedusing Slider software available at httpudeledu~mcdonaldaboutdnasliderhtml (last accessed July 16 2013) Weperformed coalescent simulations using ms software(Hudson 2002) Further methodological details are availableas supplementary material Supplementary Material onlineWe constructed the CYBA and NCF2 networks using allSNP variants and applying the Median joining algorithmand the maximum parsimony option calculations as imple-mented in the software Network 46 (Bandelt et al 1999)

Supplementary MaterialSupplementary tables S1ndashS5 and figure S1 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

SJC conceived the study ET-S MM FL RC AC CF LPand BS generated the dataanalyzed the sequences LB ACWCSM and MY provided tools for data generationand analyses ET-S MM FL WCSM and RR performedpopulation genetics analyses ET-S and MM prepared tablesand figures ET-S and SJC wrote the manuscript All the

authors contributed with discussions about the results Theauthors are grateful to Silvia Fuselli for discussions of ourresults to Fernando Levi Soares for technical assistance withthe figures and the Sequencing Group of the CoreGenotyping Facility (National Cancer Institute) for their tech-nical assistance This work was supported by the IntramuralResearch Program of the National Cancer Institute (NCI) andNational Institutes of Health (NIH) CF was supported by theUniversity of Bologna ET-S MM WCSM and FL weresupported by the Fogarty International Center and NationalCancer Institute (1R01TW007894) Conselho Nacional deDesenvolvimento Cientıfico e Tecnologico (Brazil Grant3075562012-3) Fundacao de Amparo a Pesquisa de MinasGerais (Brazil CBB PPM 0026512) and the Brazilian Ministryof Education (CAPES Agency grant PNPD 0256709-1)

ReferencesAyodo G Price AL Keinan A Ajwang A Otieno MF Orago AS

Patterson N Reich D 2007 Combining evidence of natural selectionwith association analysis increases power to detect malaria-resis-tance variants Am J Hum Genet 81(2)234ndash242

Bamshad M Wooding SP 2003 Signatures of natural selection in thehuman genome Nat Rev Genet 4(2)99ndash111

Bandelt H-J Forster P Rohl A 1999 Median-joining networks for infer-ring intraspecific phylogenies Mol Biol Evol 16(1)37ndash48

Barreiro LB Ben-Ali M Quach H et al (17 co-authors) 2009Evolutionary dynamics of human Toll-like receptors and their dif-ferent contributions to host defense PLoS Genet 5(7)e1000562

Barreiro LB Quintana-Murci L 2010 From evolutionary genetics tohuman immunology how selection shapes host defence genesNat Rev Genet 11(1)17ndash30

Barrett RD Schluter D 2008 Adaptation from standing genetic varia-tion Trends Ecol Evol 23(1)38ndash44

Bedard K Attar H Bonnefont J Jaquet V Borel C Plastre O Stasia MJAntonarakis SE Krause KH 2009 Three common polymorphisms inthe CYBA gene form a haplotype associated with decreased ROSgeneration Hum Mutat 30(7)1123ndash1133

Blekhman R Man O Herrmann L Boyko AR Indap A Kosiol CBustamante CD Teshima KM Przeworski M 2008 Natural selectionon genes that underlie human disease susceptibility Curr Biol18(12)883ndash889

Brandes RP Kreuzer J 2005 Vascular NADPH oxidases molecular mech-anisms of activation Cardiovasc Res 65(1)16ndash27

Buckley RH 2004 Pulmonary complications of primary immunodefi-ciencies Paediatr Respir Rev 5(Suppl A)S225ndashS233

Bustamante CD Fledel-Alon A Williamson S et al (13 co-authors)2005 Natural selection on protein-coding genes in the humangenome Nature 437(7062)1153ndash1157

Bustamante J Arias AA Vogt G et al (29 co-authors) 2011 GermlineCYBB mutations that selectively affect macrophages in kindredswith X-linked predisposition to tuberculous mycobacterial diseaseNat Immunol 12(3)213ndash221

Campbell MC Tishkoff SA 2008 African genetic diversity implicationsfor human demographic history modern human origins and com-plex disease mapping Annu Rev Genomics Hum Genet 9403ndash433

Chanock SJ Roesler J Zhan S Hopkins P Lee P Barrett DT ChristensenBL Curnutte JT Gorlach A 2000 Genomic structure of the humanp47-phox (NCF1) gene Blood Cells Mol Dis 26(1)37ndash46

Cunninghame Graham DS Morris DL Bhangale TR Criswell LASyvanen AC Ronnblom L Behrens TW Graham RR Vyse TJ2011 Association of NCF2 IKZF1 IRF8 IFIH1 and TYK2 with sys-temic lupus erythematosus PLoS Genet 7(10)e1002341

Excoffier L Laval G Schneider S 2007 Arlequin ver 30 an integratedsoftware package for population genetics data analysis EvolBioinform Online 147ndash50

2166

Tarazona-Santos et al doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

Excoffier L Ray N 2008 Surfing during population expansions promotesgenetic revolutions and structuration Trends Ecol Evol23(7)347ndash351

Ferrer-Admetlla A Bosch E Sikora M Marques-Bonet T Ramırez-SorianoA Muntasell A Navarro A Lazarus R Calafell F Bertranpetit J Casals F2008 Balancing selection is the main force shaping the evolution ofinnate immunity genes J Immunol 181(2)1315ndash1322

Fu YX Li WH 1993 Statistical tests of neutrality of mutations Genetics133(3)693ndash709

The 1000 Genomes Project Consortium 2010 A map of humangenome variation from population-scale sequencing Nature467(7319)1061ndash1073

The 1000 Genomes Project Consortium Abecasis GR Auton A BrooksLD DePristo MA Durbin RM Handsaker RE Kang HM Marth GTMcVean GA 2012 An integrated map of genetic variation from1092 human genomes Nature 491(7422)56ndash65

Guo SW Thompson EA 1992 Performing the exact test of Hardy-Weinberg proportion for multiple alleles Biometrics 48(2)361ndash372

Heyworth PG Cross AR Curnutte JT 2003 Chronic granulomatousdisease Curr Opin Immunol 15(5)578ndash584

Hudson RR 2002 Generating samples under a Wright-Fisher neutralmodel of genetic variation Bioinformatics 18(2)337ndash338

Jacob CO Eisenstein M Dinauer MC et al (21 co-authors) 2012 Lupus-associated causal mutation in neutrophil cytosolic factor 2 (NCF2)brings unique insights to the structure and function of NADPHoxidase Proc Natl Acad Sci U S A 109(2)E59ndashE67

Kimura M 1974 The neutral theory of molecular evolution CambridgeCambridge University Press

Kosiol C Vinar T da Fonseca RR Hubisz MJ Bustamante CD Nielsen RSiepel A 2008 Patterns of positive selection in six mammalian ge-nomes PLoS Genet 4(8)e1000144

Lacy F Kailasam MT OrsquoConnor DT Schmid-Schonbein GW Parmer RJ2000 Plasma hydrogen peroxide production in human essentialhypertension role of heredity gender and ethnicity Hypertension36(5)878ndash884

Laval G Patin E Barreiro LB Quintana-Murci L 2010 Formulating a his-torical and demographic model of recent human evolution based onresequencing data from noncoding regions PLoS One 5(4)e10284

Lewontin R 1972 The apportionment of human diversity Evol Biol 6391ndash398

Lindblad-Toh K Garber M Zuk O et al (88 co-authors) 2011 A high-resolution map of human evolutionary constraint using 29 mam-mals Nature 478(7370)476ndash482

Lodge R Diallo TO Descoteaux A 2006 Leishmania donovani lipopho-sphoglycan blocks NADPH oxidase assembly at the phagosomemembrane Cell Microbiol 8(12)1922ndash1931

Magalhaes WC Rodrigues MR Silva D Soares-Souza G Iannini MLCerqueira GC Faria-Campos AC Tarazona-Santos E 2012DIVERGENOME a bioinformatics platform to assist population ge-netics and genetic epidemiology studies Genet Epidemiol36(4)360ndash367

Marth JD Grewal PK 2008 Mammalian glycosylation in immunity NatRev Immunol 8(11)874ndash887

McDonald JH 1998 Improved tests for heterogeneity across a region ofDNA sequence in the ratio of polymorphism to divergence Mol BiolEvol 15377ndash384

McDonald JH Kreitman M 1991 Adaptive protein evolution at the Adhlocus in Drosophila Nature 351(6328)652ndash654

Nielsen R Bustamante C Clark AG et al (12 co-authors) 2005 A scanfor positively selected genes in the genomes of humans and chim-panzees PLoS Biol 3(6)e170

Olsson LM Lindqvist AK Kallberg H Padyukov L Burkhardt HAlfredsson L Klareskog L Holmdahl R 2007 A case-control studyof rheumatoid arthritis identifies an associated single nucleotidepolymorphism in the NCF4 gene supporting a role for the

NADPH-oxidase complex in autoimmunity Arthritis Res Ther9(5)R98

Packer BR Yeager M Burdett L et al (13 co-authors) 2006SNP500Cancer a public resource for sequence validation assay de-velopment and frequency analysis for genetic variation in candidategenes Nucleic Acids Res 34(Database issue)D617ndashD621

Peter BM Huerta-Sanchez E Nielsen R 2012 Distinguishing betweenselective sweeps from standing variation and from a de novo mu-tation PLoS Genet 8(10)e1003011

Przeworski M Coop G Wall JD 2005 The signature of positive selectionon standing genetic variation Evolution 59(11)2312ndash2323

Ptak SE Przeworski M 2002 Evidence for population growth in humansis confounded by fine-scale population structure Trends Genet18(11)559ndash563

Ramensky V Bork P Sunyaev S 2002 Human non-synonymous SNPsserver and survey Nucleic Acids Res 30(17)3894ndash3900

Rioux JD Xavier RJ Taylor KD et al (24 co-authors) 2007 Genome-wideassociation study identifies new susceptibility loci for Crohn diseaseand implicates autophagy in disease pathogenesis Nat Genet39(5)596ndash604

Roberts RL Hollis-Moffatt JE Gearry RB Kennedy MA Barclay MLMerriman TR 2008 Confirmation of association of IRGM andNCF4 with ileal Crohnrsquos disease in a population-based cohortGenes Immun 9(6)561ndash565

Rozas J 2009 DNA sequence polymorphism analysis using DnaSPMethods Mol Biol 537337ndash350

San Jose G Fortuno A Beloqui O Dıez J Zalba G 2008 NADPH oxidaseCYBA polymorphisms oxidative stress and cardiovascular diseasesClin Sci (Lond) 114(3)173ndash182

Santiago HC Gonzalez Lombana CZ Macedo JP et al (12 co-authors)2012 NADPH phagocyte oxidase knockout mice controlTrypanosoma cruzi proliferation but develop circulatory collapseand succumb to infection PLoS Negl Trop Dis 6(2)e1492

Stephens M Scheet P 2005 Accounting for decay of linkage disequilib-rium in haplotype inference and missing-data imputation Am JHum Genet 76(3)449ndash462

Sumimoto H Miyano K Takeya R 2005 Molecular composition andregulation of the Nox family NAD(P)H oxidases Biochem BiophysRes Commun 338(1)677ndash686

Tajima F 1983 Evolutionary relationship of DNA sequences in finitepopulations Genetics 105(2)437ndash460

Tajima F 1989 Statistical method for testing the neutral mutationhypothesis by DNA polymorphism Genetics 123(3)585ndash595

Tarazona-Santos E Bernig T Burdett L Magalhaes WC Fabbri C Liao JRedondo RA Welch R Yeager M Chanock SJ 2008 CYBB anNADPH-oxidase gene restricted diversity in humans and evidencefor differential long-term purifying selection on transmembrane andcytosolic domains Hum Mutat 29(5)623ndash632

Taylor RM Burritt JB Baniulis D Foubert TR Lord CI Dinauer MCParkos CA Jesaitis AJ 2004 Site-specific inhibitors of NADPH oxi-dase activity and structural probes of flavocytochrome b character-ization of six monoclonal antibodies to the p22phox subunitJ Immunol 173(12)7349ndash7357

Voight BF Kudaravalli S Wen X Pritchard JK 2006 A map of recentpositive selection in the human genome PLoS Biol 4(3)e72

Watterson GA 1975 On the number of segregating sites in geneticalmodels without recombination Theor Popul Biol 7(2) 256ndash276

Williamson SH Hernandez R Fledel-Alon A Zhu L Nielsen RBustamante CD 2005 Simultaneous inference of selection and pop-ulation growth from patterns of variation in the human genomeProc Natl Acad Sci U S A 102(22)7882ndash7887

Yang Z 2007a Adaptive molecular evolution In Balding DJ Bishop MCannings C editors Handbook of statistical genetics Vol 1 3rd edSusex (United Kingdom) John Wiley amp Sons p 377ndash406

Yang Z 2007b PAML 4 phylogenetic analysis by maximum likelihoodMol Biol Evol 241586ndash1591

2167

Evolutionary Dynamics of Human NADPH Oxidase Genes doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

FIG

3

Nat

ural

sele

ctio

nm

app

ing

acro

ssC

YBB

(en

codi

ng

gp91

)al

ong

mam

mal

ian

evol

utio

na

sid

enti

fied

usin

gth

ePA

ML

met

hod

byY

ang

(200

7a)

The

top

olog

ies

ofgp

91an

dp2

2ar

ere

pro

duce

dfr

omT

aylo

ret

al(

2004

Cop

yrig

ht20

04T

heA

mer

ican

Ass

ocia

tion

ofIm

mun

olog

ists

In

cU

sed

wit

hp

erm

issi

on)

Dar

kgr

ayam

ino

acid

sha

veev

olve

dun

der

pos

itiv

ese

lect

ion

wit

hgt

80

pro

babi

lity

Mos

tof

thes

eam

ino

acid

sar

ein

the

extr

acel

lula

rp

orti

onof

the

pro

tein

The

upp

erp

art

ofth

efig

ure

show

sth

ep

rote

inal

ign

men

tfo

rn

ine

mam

mal

sof

the

gp91

regi

onin

dica

ted

byth

ebl

ack

ellip

seI

nth

isre

gion

ahi

ghle

velo

fam

ino

acid

varia

tion

isfo

und

betw

een

spec

ies

and

seve

ralc

odon

ssh

owgt

1In

this

alig

nm

ent

gray

vert

ical

bars

corr

esp

ond

tova

riab

leam

ino

acid

site

sT

hep

rote

inal

ign

men

tof

mam

mal

ssh

ows

the

follo

win

gsp

ecie

sH

om(H

omo

sapi

ens

sapi

ens)

Pan

(Pan

trog

lody

tes)

Mac

(Mac

aca

mul

atta

)M

us(M

usm

uscu

lus)

Rat

(Rat

tus

norv

egic

us)

Cri

(Cri

cetu

lus

gris

eus)

Het

(Het

eroc

epha

lus

glab

er)

Cav

(Cav

iapo

rcel

lus)

an

dO

ry(O

ryct

olag

uscu

nicu

lus)

EC

ext

race

lula

ren

viro

nm

ent

TM

tra

nsm

embr

ane

laye

ran

dIC

in

trac

ellu

lar

envi

ron

men

t

2161

Evolutionary Dynamics of Human NADPH Oxidase Genes doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

From the four studied genes NCF4 which encodes theregulatory protein p40-phox shows a pattern of diversitythat is typical for a gene that has evolved under neutralityIn addition to the features described in the previous para-graph the allelic spectra of NCF4 in the studied populationsare consistent with a neutral model of evolution (tables 2ndash4)

Although CYBB presented the most interesting evolution-ary history at the interspecific level with repeated episodes ofpositive natural selection recent human evolutionary historyhas resulted in interesting patterns of variation for CYBA andNCF2 CYBA encodes p22-phox which is a transmembraneprotein shared by different NADPH oxidases In addition toharboring two common nonsynonymous polymorphisms(V174A is also variable among different species of mammals)CYBA is the most variable and most affected by recombina-tion among the NADPH oxidase genes (tables 1 and 2) Inparticular CYBA diversity is very high in Europe comparedwith 329 genes resequenced in a European sample (httppgagswashingtonedusummary_statshtml last accessed July16 2013) CYBA ranks 11th (ie the 97th percentile)Moreover there are contrasting proportions of the totalnumber of polymorphismsnumber of singletons betweenAfricans (a low proportion) and Europeans (a high propor-tion) the latter showing an excess of common polymor-phisms with respect to the neutral expectation (see the DFL

test in table 4 and supplementary table S4 SupplementaryMaterial online) This excess of common variants inEuropeans is also significant when we conservatively testedit against a scenario of human evolution that incorporates theldquoOut of Africardquo bottleneck (Laval et al 2010) and the observedlevel of recombination in Europeans (CYBA = 807 for the se-quenced region) Because demographic forces and recombi-nation levels alone do not explain the high CYBA diversity andits excess of common polymorphisms we suggest that bal-ancing natural selection (that acts by maintaining differentalleles at high frequency in a population) has contributed to

shape the diversity of CYBA at least in the European popu-lation Indeed figure 4a shows that the haplotype network ofCYBA for the European population is consistent with theaction of balancing natural selection (Bamshad andWooding 2003) showing two well-differentiated commonclades that explain the observed high diversity and theexcess of common CYBA variants A comparative genomicanalysis confirms this inference the ratio of polymorphismsto differences fixed between human and chimpanzee is nothomogeneous along the gene in the different human popu-lations (Mc Donald 1998 supplementary table S5Supplementary Material online) as would be expectedunder neutral evolution Our inference of balancing naturalselection is consistent with the fact that 25ndash30 of the var-iation in levels of ROS production can be attributed to geneticfactors (Lacy et al 2000) and that ROS levels are associatedwith CYBA variants (Bedard et al 2009)

p67 encoded by NCF2 is a necessary cytosolic NADPHcomponent for phagocyte ROS production Asians show ahighly differentiated NCF2 haplotype structure (see frequen-cies of haplotypes NCF2-D11 and NCF2-E10 in the haplotypetables online and in the network shown in fig 4b) and thehighest FST values are observed in pairwise comparisons be-tween Asians and non-Asian populations (in particular withEuropeans table 3) and not between Africans and non-African populations as is usually observed in the humangenome We confirmed these results by analyzing data forNCF2 from the HapMap Project (supplementary materialSupplementary Material online) Moreover a trend towardan excess of rare polymorphisms exists in Asians that is notobserved elsewhere (tables 1 2 and 4 DFL =1904FFL =1893) Although we cannot exclude that this patternof diversity is compatible with the null hypotheses of neutral-ity and with the tested demographic history of human pop-ulations inferred by Laval et al (2010 tables 2ndash4) we canspeculate and envisage four additional evolutionary scenarios

Table 1 Allele Frequencies of Nonsynonymous Polymorphisms in NADPH Oxidase Genes

Genes rs Minor Allele(Amino Acid)

PolyphenPrediction

African European Asian Hispanic

CYBA

Y72H rs4673 T (Y) Possibly damaging 046 032 017 022

V174A rs1049254 T (V) Benign 017 048 048 018

NCF2

K181R rs2274064 G (R) Benign 035 037 041 048

T279M rs13306581 T (T) Probably damaging 000 000 005 000

V297A rs35937854 C (A) Benign 004 000 000 000

T361S Chr1181799289 NCBI36hg18 T (S) mdash 000 000 002 000

H389Q rs17849502 A (Q) Benign 000 005 000 007

R395W rs13306575 T (W) Possibly damaging 000 000 000 007

N419I rs35012521 T (I) Probably damaging 000 002 004 000

P454S rs55761650 T (S) mdash 000 000 000 002

L487S Chr1181795862 NCBI36hg18 C (S) mdash 002 000 000 000

NCF4

T85N rs112306225 A (N) mdash 000 000 002 000

A304E rs5995361 A (E) Benign 004 000 000 000

2162

Tarazona-Santos et al doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

that may have contributed to shape the pattern of NCF2diversity in Asians 1) the observed trend is suggestive of aselective sweep on NCF2 standing variation a neutral orweakly deleterious existing variant becomes beneficial andrapidly increases in frequency (together with its associatedhaplotypes ie incomplete sweep) reducing the nucleotidediversity in the surrounding region and rendering other stand-ing substitutions rare During this process new rare substitu-tions appear in the expanding positively selected haplotype2) The pattern of diversity of NCF2 in Asia may result from anincomplete selective sweep acting on ARPC5 which is locatedapproximately 35 kb downstream of NCF2 In a genome-widescan for recent positive selection Voight et al (2006) identi-fied a strong signature of an incomplete sweep for ARPC5 inAsia (P = 0009) characterized by a higher than expected long-range linkage disequilibrium summarized by very high iHS

statistics (see Haplotter results for the HapMap II data athttphaplotteruchicagoedu last accessed July 16 2013)SNPs in NCF2 also presents high iHS statistics (P = 002) al-though values are lower than for ARPC5 3) The differentiatedpattern of diversity of NCF2 in Asia may also have been gen-erated without the action of natural selection during the firstcolonization of Asia by modern humans In a process of geo-graphic population expansion specifically in the front wave ofthe expansion some rare alleleshaplotypes (ie surfing al-leles) may become common by chance mimicking the pat-tern of diversity generated by a selective sweep (Excoffier andRay 2008) 4) The excess of rare variants may be an artifact ofpooling individuals from different populations (Ptak andPrzeworski 2002) Consistent with evolutionary scenarios 1ndash4 that produce similar patterns of diversity the haplotypenetwork of NCF2 for Eurasians (fig 4b) shows the following1) a large differentiation between Asians and Europeans thatis compatible with the high observed FST values and 2) a star-like shape associated with the haplotype NCF2-E10 that iscommon in Asia and rare elsewhere which is compatiblewith the excess of rare alleles in the Asian populations

DiscussionBy analyzing 29 mammalian genomes and four human pop-ulations we show in this study that natural selection hasacted in different ways over time to shape the pattern ofdiversity of the phagocyte NADPH oxidase genes At the tem-poral scale of the evolution of mammals we have inferredrecurrent episodes of positive selection acting on the extra-cellular portion of gp-91 that have been important to shapethe pattern of interspecific diversity of this gene Our in-terspecific analyses did not show a similar pattern of naturalselection in any of the other phagocyte NADPH oxidasegenes Even if current knowledge on the biology of NADPHdoes not allow us to interpret our results in terms of functionwe propose that the extracellular region of gp-91 is function-ally relevant Our results also imply that this region is highlydifferentiated among mammals at the protein level and thisvariability should be considered when mammals models areused to study the structure and function of phagocyteNADPH components

In the time scale of human evolution our analyses of theNADPH oxidase genes suggest that CYBA has been a target ofbalancing natural selection Because we do not have evidenceof population-specific variants that faced selective pressurethe inferred natural selection may have acted on a standingvariation in ancestral populations This implies that the selec-tive pressure began after the appearance of the variant andpossibly acted in a specific geographic region (Barret andSchluter 2008) The signatures of natural selection acting ona new mutation and on standing variation differ In the case ofselective sweeps episodes of natural selection on standingvariation are associated to a larger variance in the allelic spec-trum with respect to natural selection on a new mutationAlso selection on standing variation may produce an excess ofalleles at intermediate frequencies that is not associated withhigh nucleotide diversity (Przeworski et al 2005 Peter et al2012) This pattern contrasts with the effect of balancing

Table 2 Intrapopulation Diversity Indexes in the Studied Populationsfor the NADPH Oxidase Genes Obtained from Resequencing Dataa

African European Asian Hispanic

Number of chromosomes

CYBBb 42 52 34 36

CYBA 48 62 48 46

NCF2 48 62 48 46

NCF4 48 62 48 46

Segregating sitessingletons

CYBB 218 70 103 135

CYBA 6122 333 335 347

NCF2 4613 3311 2816 3714

NCF4 4512 267 191 308

Haplotype structureNumber of inferred haplotypesc

CYBB 14 5 7 12

CYBA 39 39 32 26

NCF2 38 32 18 31

NCF4 36 30 17 22

Haplotype diversity plusmn SD

CYBB 088 plusmn 003 034 plusmn 008 053 plusmn 010 070 plusmn 008

CYBA 098 plusmn 001 098 plusmn 001 097 plusmn 000 096 plusmn 002

NCF2 099 plusmn 001 096 plusmn 001 087 plusmn 004 097 plusmn 001

NCF4 098 plusmn 001 093 plusmn 002 093 plusmn 002 093 plusmn 002

Recombination parameter (q 103 per site)c

CYBB 008 lt001 lt001 lt002

CYBA 291 148 096 088

NCF2 052 038 010 058

NCF4 137 103 039 022

h estimatorsd

p 103 per site

CYBB 036 012 015 028

CYBA 190 163 151 157

NCF2 081 047 043 057

NCF4 101 076 064 084

hW 103 per site

CYBB 042 013 021 027

CYBA 231 118 125 13

NCF2 102 069 062 083

NCF4 101 070 054 087

aMost analyses were performed using software DnaSP (Rozas 2009)bData for CYBB (Xp211) are from Tarazona-Santos et al (2008)cHaplotypes and inferred using the method by Stephens and Scheet (2005) andthe software PHASEd Tajima (1983) W Watterson (1975)

2163

Evolutionary Dynamics of Human NADPH Oxidase Genes doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

natural selection which produces an excess of common allelesassociated with high genetic diversity Thus the observed pat-tern of CYBA diversity in Europeans is not consistent with aselective sweep on a standing variation but it is consistent witha scenario of balancing selection acting on standing variation

If we consider for CYBA that heterozygote advantage maybe the mechanisms of balancing selection we can speculatethat the biological basis for this mechanism may be the fol-lowing considering that p22-phox is not exclusive of thephagocyte NADPH oxidase but it is also part of Nox com-plexes expressed in other tissues the dependence of ROSproduction on CYBA variants has to be finely regulated IfCYBA variants induce high levels of ROS these variants mayfavor a phagocyte-dependent efficient response to pathogensbut may damage other endothelial tissues Alternativelytissue oxidative damage does not occur if ROS productionis low but this response may be associated with a weaker

phagocyte respiratory burst against pathogens In this con-text heterozygote individuals with a CYBA-dependent inter-mediate level of ROS production may have been favored bynatural selection

Our results contribute to the discussion regarding the rel-evance of balancing selection in shaping the diversity ofinnate immunity genes (Ferrer-Admetlla et al 2008 Barreiroet al 2009) Ferrer-Admetlla et al (2008) have associated therecurrent signatures of balancing selection on inflammatorygenes with the need for fine regulation CYBA is the onlyphagocytic NADPH oxidase gene that also encodesnonphagocyte Nox components thus CYBA has a role inROS cell signaling a potentially dangerous process due toits capability to produce oxidative damage to tissues Geneswith these characteristics likely need even tighter regulationThe interplay between pathogen-driven selective pressureon innate immunity genes and their concomitantnonimmunological functions is complex In addition toCYBA other interesting examples of this interplay can befound among the 10 human toll-like receptors (TLRs) thatshow a variety of signatures of natural selection TLR8 whichshows the strongest signature of purifying selection is alsoinvolved in neuronal development (Barreiro et al 2009) and itis difficult to discriminate the role of each function in deter-mining the observed signature of natural selection

With few exceptions the pathogens responsible for naturalselection on immune genes are difficult to specify In the caseof NADPH oxidase we can infer based on the spectrum ofinfections in CGD patients that catalase-positive bacteria andfungi such as Staphylococci Salmonella Candida Aspergillusand M tuberculosis may be the selective pathogensInteractions between the host and pathogens also includemechanisms of the latter to impair the respiratory burst ofthe former For example Leishmania donovani blocks the as-sembly of NADPH oxidase at the phagosome membrane(Lodge et al 2006) These mechanisms may constitute selec-tive pressures imposed by pathogens

The associations reported in GWAS between rs4821544 inNCF4 and Crohnrsquos disease (an idiopathic inflammatory boweldisease that predominantly involves the ileum and colonRioux et al 2007) and between rs10911363 in NCF2 and

Table 3 Pairwise FST Genetic Distances between Populations

CYBBa CYBA

Africa Europe Asia Hispanic Africa Europe Asia Hispanic

Africa mdash 0316 0264 0092 mdash 0074 0083 0065

Europe 0257 mdash 0000 0107 mdash 0002 0056

Asia 0211 0000 mdash 0073 mdash 0054

Hispanic 0070 0082 0056 mdash mdash

NCF2 NCF4

Africa mdash 0048 0058 0037 mdash 0128 0136 0158

Europe mdash 0069 0000 mdash 0026 0026

Asia mdash 0059 mdash 0005

Hispanic mdash mdash

aFor CYBB FST estimators are above the diagonal Below the diagonal are the FST values corrected as if the effective population sizes of X chromosome genes were equal toautosomal ones

Table 4 Results of Neutrality Tests for the NADPH Oxidase Genesand Their Significancea

African European Asian Hispanic

Tajimarsquos D

CYBB 0473 0274 0813 0084

CYBA 0580 1242 0684 0707

NCF2 0412 0939 0883 0987

NCF4 0750 0243 0556 0102

Fu and Lirsquos D

CYBB 1050 1110 0395 0977

CYBA 1485 1734 1188 0138

NCF2 0407 1105 1904 1398

NCF4 0308 0443 1893 0266

Fu and Lirsquos F

CYBB 0980 0811 0114 0814

CYBA 1382 1893 1205 0405

NCF2 0512 1252 1893 1567

NCF4 0557 0231 1225 0248

NOTEmdashUnderlined values represent significant results under the demographic modelinferred by Laval et al (2010) for human populations See details in supplementarytable S4 Supplementary Material onlineaThe McDonaldndashKreitman test is nonsignificant in any of the cases

Significant under the WrightndashFisher model of constant population size

2164

Tarazona-Santos et al doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

systemic lupus erythematosus (Cunninghame Graham et al2011) confirm the involvement of NADPH genes in the path-ogenesis of inflammatory-related common diseases Ourclaim that natural selection acted on CYBA (and maybe inNCF2) is relevant for biomedical studies because combiningevidence of natural selection with association analyses inimmune genes increases the statistical power to detect dis-ease-associated variants (Ayodo et al 2007) As a new gener-ation of association studies focusing on rare variation isemerging the combination of genes deemed interestingfrom GWAS and populations with an excess of rare variantsin these genes such as NCF2 in Asians are particularly inter-esting as a source of rare variants with clinical relevanceFinally by determining through molecular evolution mappingthat the extracellular portion of gp91 (encoded by CYBB inthe X-chromosome) has been subject to recurrent episodes ofpositive selection at the scale of mammals evolution we positthe hypothesis that this portion of the NADPH oxidase isrelevant for currently unknown biological processes thatonce revealed by structural and functional investigationswill contribute to understanding the role of NADPH oxidasein infectious autoimmune and cardiovascular diseases

Materials and Methods

Molecular Evolution Analysis of NADPH Genes

We used the maximum likelihood framework developed byYang (2007a) to estimate for the NADPH oxidase genes

under a variety of evolutionary models as implemented in thePAML software (Yang 2007b) This approach allows infer-ences about the evolution of a coding region along aninter-specific phylogeny mapping the codons that haveevolved under strongweak purifying selection neutrality oradaptive positive selection Further details about these anal-yses are available as supplementary material SupplementaryMaterial online

Human Population Genetics of NADPH Genes

For human population genetics analyses we conducted bidi-rectional Sanger sequencing of CYBB CYBA NCF2 and NCF4for a total of 35242 bp for each of 102 healthy individuals aspart of the SNP500 Cancer project (Packer et al 2006 seesupplementary fig S1 Supplementary Material online for de-tails) Human population resequencing data for CYBB werepublished in Tarazona-Santos et al (2008) The resequencingexperiments were performed as in Packer et al (2006) These102 unrelated individuals include the following 24 of Africanancestry (15 African Americans from the United States and 9Pygmies) 23 admixed Latin Americans (from Mexico PuertoRico and South America) 31 Europeans (from the CEPHUTAH pedigree and the NIEHS Environmental GenomeProject) and 24 AsiansndashOceanians (from Melanesia PakistanChina Cambodia Japan and Taiwan)

After controlling for multiple tests we confirmed that allSNPs fit the HardyndashWeinberg equilibrium by the Guo and

CYBA - A16

CYBA - A3

Y72H - rs4673

V174A - rs1049254 CYBA - E18

(a)

NCF2 - E10 H389Q - rs17849502

NCF2 - B3

NCF2 - D11

NCF2 - C1K181R - rs2274064(b)

FIG 4 Phylogenetic networks of (a) CYBA in Europeans and (b) NCF2 in Europeans (black) and Asians (yellow) The lengths of the branches areproportional to the number of mutations Only nonsynonymous mutations are shown The haplotype names correspond to the table of inferredhaplotypes in the supplementary material Supplementary Material online We only show the names of haplotypes with a frequency gt5 Ancestralhaplotypes (inferred as being the human haplotype or median vector most similar to the chimpanzee sequence) are indicated by an arrow Medianvectors are in red

2165

Evolutionary Dynamics of Human NADPH Oxidase Genes doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

Thompson (1992) test which was implemented in the soft-ware Arlequin 30 (Excoffier et al 2007) Insertion-deletions(INDELs) were excluded from population genetics analysesTo assess intrapopulation diversity we used two statistics the per-site mean number of pairwise differences betweensequences (Tajima 1983) and w based on the number ofsegregating sites (S) (Watterson 1975) We measured pairwisebetween-populations diversity by using the FST statistics cal-culated using the software DnaSP (Rozas 2009)

Haplotypes and the recombination parameter were in-ferred using the PHASE software (Stephens and Scheet 2005)and diversity indexes calculations (tables 2 and 3) as well asneutrality statistics (table 4) were estimated using DnaSP soft-ware We applied two kinds of neutrality tests 1) tests basedon the allelic spectrum which is the distribution of polymor-phisms across different classes of frequencies namelyTajimarsquos DT (Tajima 1989) and FundashLirsquos DFL and FFL (Fu andLi 1993) and 2) tests based on comparisons between thenumber of polymorphisms in human populations and fixeddifferences with the chimpanzee (ie outgroup) namely theMcDonald and Kreitman (1991) test and the adaptedKolmogorovndashSmirnoff test by McDonald (1998) For thefirst set of tests we used as null hypotheses both the classicWrightndashFisher model of neutrality with a constant popula-tion size as well as the more realistic evolutionary scenario forhuman populations inferred by Laval et al (2010) In the caseof the scenario of Laval et al (2010) we ignored interconti-nental gene flow within the Old World because these raregene flow events likely does not affect the level of significanceof the neutrality tests given its very low inferred values(13 105) Null distributions used to test the significanceof the neutrality tests under these evolutionary scenarios weregenerated using coalescent simulations and a significancelevel of 005 (Hudson 2002) The KolmogorovndashSmirnoff testof neutrality adapted by McDonald (1998) was performedusing Slider software available at httpudeledu~mcdonaldaboutdnasliderhtml (last accessed July 16 2013) Weperformed coalescent simulations using ms software(Hudson 2002) Further methodological details are availableas supplementary material Supplementary Material onlineWe constructed the CYBA and NCF2 networks using allSNP variants and applying the Median joining algorithmand the maximum parsimony option calculations as imple-mented in the software Network 46 (Bandelt et al 1999)

Supplementary MaterialSupplementary tables S1ndashS5 and figure S1 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

SJC conceived the study ET-S MM FL RC AC CF LPand BS generated the dataanalyzed the sequences LB ACWCSM and MY provided tools for data generationand analyses ET-S MM FL WCSM and RR performedpopulation genetics analyses ET-S and MM prepared tablesand figures ET-S and SJC wrote the manuscript All the

authors contributed with discussions about the results Theauthors are grateful to Silvia Fuselli for discussions of ourresults to Fernando Levi Soares for technical assistance withthe figures and the Sequencing Group of the CoreGenotyping Facility (National Cancer Institute) for their tech-nical assistance This work was supported by the IntramuralResearch Program of the National Cancer Institute (NCI) andNational Institutes of Health (NIH) CF was supported by theUniversity of Bologna ET-S MM WCSM and FL weresupported by the Fogarty International Center and NationalCancer Institute (1R01TW007894) Conselho Nacional deDesenvolvimento Cientıfico e Tecnologico (Brazil Grant3075562012-3) Fundacao de Amparo a Pesquisa de MinasGerais (Brazil CBB PPM 0026512) and the Brazilian Ministryof Education (CAPES Agency grant PNPD 0256709-1)

ReferencesAyodo G Price AL Keinan A Ajwang A Otieno MF Orago AS

Patterson N Reich D 2007 Combining evidence of natural selectionwith association analysis increases power to detect malaria-resis-tance variants Am J Hum Genet 81(2)234ndash242

Bamshad M Wooding SP 2003 Signatures of natural selection in thehuman genome Nat Rev Genet 4(2)99ndash111

Bandelt H-J Forster P Rohl A 1999 Median-joining networks for infer-ring intraspecific phylogenies Mol Biol Evol 16(1)37ndash48

Barreiro LB Ben-Ali M Quach H et al (17 co-authors) 2009Evolutionary dynamics of human Toll-like receptors and their dif-ferent contributions to host defense PLoS Genet 5(7)e1000562

Barreiro LB Quintana-Murci L 2010 From evolutionary genetics tohuman immunology how selection shapes host defence genesNat Rev Genet 11(1)17ndash30

Barrett RD Schluter D 2008 Adaptation from standing genetic varia-tion Trends Ecol Evol 23(1)38ndash44

Bedard K Attar H Bonnefont J Jaquet V Borel C Plastre O Stasia MJAntonarakis SE Krause KH 2009 Three common polymorphisms inthe CYBA gene form a haplotype associated with decreased ROSgeneration Hum Mutat 30(7)1123ndash1133

Blekhman R Man O Herrmann L Boyko AR Indap A Kosiol CBustamante CD Teshima KM Przeworski M 2008 Natural selectionon genes that underlie human disease susceptibility Curr Biol18(12)883ndash889

Brandes RP Kreuzer J 2005 Vascular NADPH oxidases molecular mech-anisms of activation Cardiovasc Res 65(1)16ndash27

Buckley RH 2004 Pulmonary complications of primary immunodefi-ciencies Paediatr Respir Rev 5(Suppl A)S225ndashS233

Bustamante CD Fledel-Alon A Williamson S et al (13 co-authors)2005 Natural selection on protein-coding genes in the humangenome Nature 437(7062)1153ndash1157

Bustamante J Arias AA Vogt G et al (29 co-authors) 2011 GermlineCYBB mutations that selectively affect macrophages in kindredswith X-linked predisposition to tuberculous mycobacterial diseaseNat Immunol 12(3)213ndash221

Campbell MC Tishkoff SA 2008 African genetic diversity implicationsfor human demographic history modern human origins and com-plex disease mapping Annu Rev Genomics Hum Genet 9403ndash433

Chanock SJ Roesler J Zhan S Hopkins P Lee P Barrett DT ChristensenBL Curnutte JT Gorlach A 2000 Genomic structure of the humanp47-phox (NCF1) gene Blood Cells Mol Dis 26(1)37ndash46

Cunninghame Graham DS Morris DL Bhangale TR Criswell LASyvanen AC Ronnblom L Behrens TW Graham RR Vyse TJ2011 Association of NCF2 IKZF1 IRF8 IFIH1 and TYK2 with sys-temic lupus erythematosus PLoS Genet 7(10)e1002341

Excoffier L Laval G Schneider S 2007 Arlequin ver 30 an integratedsoftware package for population genetics data analysis EvolBioinform Online 147ndash50

2166

Tarazona-Santos et al doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

Excoffier L Ray N 2008 Surfing during population expansions promotesgenetic revolutions and structuration Trends Ecol Evol23(7)347ndash351

Ferrer-Admetlla A Bosch E Sikora M Marques-Bonet T Ramırez-SorianoA Muntasell A Navarro A Lazarus R Calafell F Bertranpetit J Casals F2008 Balancing selection is the main force shaping the evolution ofinnate immunity genes J Immunol 181(2)1315ndash1322

Fu YX Li WH 1993 Statistical tests of neutrality of mutations Genetics133(3)693ndash709

The 1000 Genomes Project Consortium 2010 A map of humangenome variation from population-scale sequencing Nature467(7319)1061ndash1073

The 1000 Genomes Project Consortium Abecasis GR Auton A BrooksLD DePristo MA Durbin RM Handsaker RE Kang HM Marth GTMcVean GA 2012 An integrated map of genetic variation from1092 human genomes Nature 491(7422)56ndash65

Guo SW Thompson EA 1992 Performing the exact test of Hardy-Weinberg proportion for multiple alleles Biometrics 48(2)361ndash372

Heyworth PG Cross AR Curnutte JT 2003 Chronic granulomatousdisease Curr Opin Immunol 15(5)578ndash584

Hudson RR 2002 Generating samples under a Wright-Fisher neutralmodel of genetic variation Bioinformatics 18(2)337ndash338

Jacob CO Eisenstein M Dinauer MC et al (21 co-authors) 2012 Lupus-associated causal mutation in neutrophil cytosolic factor 2 (NCF2)brings unique insights to the structure and function of NADPHoxidase Proc Natl Acad Sci U S A 109(2)E59ndashE67

Kimura M 1974 The neutral theory of molecular evolution CambridgeCambridge University Press

Kosiol C Vinar T da Fonseca RR Hubisz MJ Bustamante CD Nielsen RSiepel A 2008 Patterns of positive selection in six mammalian ge-nomes PLoS Genet 4(8)e1000144

Lacy F Kailasam MT OrsquoConnor DT Schmid-Schonbein GW Parmer RJ2000 Plasma hydrogen peroxide production in human essentialhypertension role of heredity gender and ethnicity Hypertension36(5)878ndash884

Laval G Patin E Barreiro LB Quintana-Murci L 2010 Formulating a his-torical and demographic model of recent human evolution based onresequencing data from noncoding regions PLoS One 5(4)e10284

Lewontin R 1972 The apportionment of human diversity Evol Biol 6391ndash398

Lindblad-Toh K Garber M Zuk O et al (88 co-authors) 2011 A high-resolution map of human evolutionary constraint using 29 mam-mals Nature 478(7370)476ndash482

Lodge R Diallo TO Descoteaux A 2006 Leishmania donovani lipopho-sphoglycan blocks NADPH oxidase assembly at the phagosomemembrane Cell Microbiol 8(12)1922ndash1931

Magalhaes WC Rodrigues MR Silva D Soares-Souza G Iannini MLCerqueira GC Faria-Campos AC Tarazona-Santos E 2012DIVERGENOME a bioinformatics platform to assist population ge-netics and genetic epidemiology studies Genet Epidemiol36(4)360ndash367

Marth JD Grewal PK 2008 Mammalian glycosylation in immunity NatRev Immunol 8(11)874ndash887

McDonald JH 1998 Improved tests for heterogeneity across a region ofDNA sequence in the ratio of polymorphism to divergence Mol BiolEvol 15377ndash384

McDonald JH Kreitman M 1991 Adaptive protein evolution at the Adhlocus in Drosophila Nature 351(6328)652ndash654

Nielsen R Bustamante C Clark AG et al (12 co-authors) 2005 A scanfor positively selected genes in the genomes of humans and chim-panzees PLoS Biol 3(6)e170

Olsson LM Lindqvist AK Kallberg H Padyukov L Burkhardt HAlfredsson L Klareskog L Holmdahl R 2007 A case-control studyof rheumatoid arthritis identifies an associated single nucleotidepolymorphism in the NCF4 gene supporting a role for the

NADPH-oxidase complex in autoimmunity Arthritis Res Ther9(5)R98

Packer BR Yeager M Burdett L et al (13 co-authors) 2006SNP500Cancer a public resource for sequence validation assay de-velopment and frequency analysis for genetic variation in candidategenes Nucleic Acids Res 34(Database issue)D617ndashD621

Peter BM Huerta-Sanchez E Nielsen R 2012 Distinguishing betweenselective sweeps from standing variation and from a de novo mu-tation PLoS Genet 8(10)e1003011

Przeworski M Coop G Wall JD 2005 The signature of positive selectionon standing genetic variation Evolution 59(11)2312ndash2323

Ptak SE Przeworski M 2002 Evidence for population growth in humansis confounded by fine-scale population structure Trends Genet18(11)559ndash563

Ramensky V Bork P Sunyaev S 2002 Human non-synonymous SNPsserver and survey Nucleic Acids Res 30(17)3894ndash3900

Rioux JD Xavier RJ Taylor KD et al (24 co-authors) 2007 Genome-wideassociation study identifies new susceptibility loci for Crohn diseaseand implicates autophagy in disease pathogenesis Nat Genet39(5)596ndash604

Roberts RL Hollis-Moffatt JE Gearry RB Kennedy MA Barclay MLMerriman TR 2008 Confirmation of association of IRGM andNCF4 with ileal Crohnrsquos disease in a population-based cohortGenes Immun 9(6)561ndash565

Rozas J 2009 DNA sequence polymorphism analysis using DnaSPMethods Mol Biol 537337ndash350

San Jose G Fortuno A Beloqui O Dıez J Zalba G 2008 NADPH oxidaseCYBA polymorphisms oxidative stress and cardiovascular diseasesClin Sci (Lond) 114(3)173ndash182

Santiago HC Gonzalez Lombana CZ Macedo JP et al (12 co-authors)2012 NADPH phagocyte oxidase knockout mice controlTrypanosoma cruzi proliferation but develop circulatory collapseand succumb to infection PLoS Negl Trop Dis 6(2)e1492

Stephens M Scheet P 2005 Accounting for decay of linkage disequilib-rium in haplotype inference and missing-data imputation Am JHum Genet 76(3)449ndash462

Sumimoto H Miyano K Takeya R 2005 Molecular composition andregulation of the Nox family NAD(P)H oxidases Biochem BiophysRes Commun 338(1)677ndash686

Tajima F 1983 Evolutionary relationship of DNA sequences in finitepopulations Genetics 105(2)437ndash460

Tajima F 1989 Statistical method for testing the neutral mutationhypothesis by DNA polymorphism Genetics 123(3)585ndash595

Tarazona-Santos E Bernig T Burdett L Magalhaes WC Fabbri C Liao JRedondo RA Welch R Yeager M Chanock SJ 2008 CYBB anNADPH-oxidase gene restricted diversity in humans and evidencefor differential long-term purifying selection on transmembrane andcytosolic domains Hum Mutat 29(5)623ndash632

Taylor RM Burritt JB Baniulis D Foubert TR Lord CI Dinauer MCParkos CA Jesaitis AJ 2004 Site-specific inhibitors of NADPH oxi-dase activity and structural probes of flavocytochrome b character-ization of six monoclonal antibodies to the p22phox subunitJ Immunol 173(12)7349ndash7357

Voight BF Kudaravalli S Wen X Pritchard JK 2006 A map of recentpositive selection in the human genome PLoS Biol 4(3)e72

Watterson GA 1975 On the number of segregating sites in geneticalmodels without recombination Theor Popul Biol 7(2) 256ndash276

Williamson SH Hernandez R Fledel-Alon A Zhu L Nielsen RBustamante CD 2005 Simultaneous inference of selection and pop-ulation growth from patterns of variation in the human genomeProc Natl Acad Sci U S A 102(22)7882ndash7887

Yang Z 2007a Adaptive molecular evolution In Balding DJ Bishop MCannings C editors Handbook of statistical genetics Vol 1 3rd edSusex (United Kingdom) John Wiley amp Sons p 377ndash406

Yang Z 2007b PAML 4 phylogenetic analysis by maximum likelihoodMol Biol Evol 241586ndash1591

2167

Evolutionary Dynamics of Human NADPH Oxidase Genes doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

From the four studied genes NCF4 which encodes theregulatory protein p40-phox shows a pattern of diversitythat is typical for a gene that has evolved under neutralityIn addition to the features described in the previous para-graph the allelic spectra of NCF4 in the studied populationsare consistent with a neutral model of evolution (tables 2ndash4)

Although CYBB presented the most interesting evolution-ary history at the interspecific level with repeated episodes ofpositive natural selection recent human evolutionary historyhas resulted in interesting patterns of variation for CYBA andNCF2 CYBA encodes p22-phox which is a transmembraneprotein shared by different NADPH oxidases In addition toharboring two common nonsynonymous polymorphisms(V174A is also variable among different species of mammals)CYBA is the most variable and most affected by recombina-tion among the NADPH oxidase genes (tables 1 and 2) Inparticular CYBA diversity is very high in Europe comparedwith 329 genes resequenced in a European sample (httppgagswashingtonedusummary_statshtml last accessed July16 2013) CYBA ranks 11th (ie the 97th percentile)Moreover there are contrasting proportions of the totalnumber of polymorphismsnumber of singletons betweenAfricans (a low proportion) and Europeans (a high propor-tion) the latter showing an excess of common polymor-phisms with respect to the neutral expectation (see the DFL

test in table 4 and supplementary table S4 SupplementaryMaterial online) This excess of common variants inEuropeans is also significant when we conservatively testedit against a scenario of human evolution that incorporates theldquoOut of Africardquo bottleneck (Laval et al 2010) and the observedlevel of recombination in Europeans (CYBA = 807 for the se-quenced region) Because demographic forces and recombi-nation levels alone do not explain the high CYBA diversity andits excess of common polymorphisms we suggest that bal-ancing natural selection (that acts by maintaining differentalleles at high frequency in a population) has contributed to

shape the diversity of CYBA at least in the European popu-lation Indeed figure 4a shows that the haplotype network ofCYBA for the European population is consistent with theaction of balancing natural selection (Bamshad andWooding 2003) showing two well-differentiated commonclades that explain the observed high diversity and theexcess of common CYBA variants A comparative genomicanalysis confirms this inference the ratio of polymorphismsto differences fixed between human and chimpanzee is nothomogeneous along the gene in the different human popu-lations (Mc Donald 1998 supplementary table S5Supplementary Material online) as would be expectedunder neutral evolution Our inference of balancing naturalselection is consistent with the fact that 25ndash30 of the var-iation in levels of ROS production can be attributed to geneticfactors (Lacy et al 2000) and that ROS levels are associatedwith CYBA variants (Bedard et al 2009)

p67 encoded by NCF2 is a necessary cytosolic NADPHcomponent for phagocyte ROS production Asians show ahighly differentiated NCF2 haplotype structure (see frequen-cies of haplotypes NCF2-D11 and NCF2-E10 in the haplotypetables online and in the network shown in fig 4b) and thehighest FST values are observed in pairwise comparisons be-tween Asians and non-Asian populations (in particular withEuropeans table 3) and not between Africans and non-African populations as is usually observed in the humangenome We confirmed these results by analyzing data forNCF2 from the HapMap Project (supplementary materialSupplementary Material online) Moreover a trend towardan excess of rare polymorphisms exists in Asians that is notobserved elsewhere (tables 1 2 and 4 DFL =1904FFL =1893) Although we cannot exclude that this patternof diversity is compatible with the null hypotheses of neutral-ity and with the tested demographic history of human pop-ulations inferred by Laval et al (2010 tables 2ndash4) we canspeculate and envisage four additional evolutionary scenarios

Table 1 Allele Frequencies of Nonsynonymous Polymorphisms in NADPH Oxidase Genes

Genes rs Minor Allele(Amino Acid)

PolyphenPrediction

African European Asian Hispanic

CYBA

Y72H rs4673 T (Y) Possibly damaging 046 032 017 022

V174A rs1049254 T (V) Benign 017 048 048 018

NCF2

K181R rs2274064 G (R) Benign 035 037 041 048

T279M rs13306581 T (T) Probably damaging 000 000 005 000

V297A rs35937854 C (A) Benign 004 000 000 000

T361S Chr1181799289 NCBI36hg18 T (S) mdash 000 000 002 000

H389Q rs17849502 A (Q) Benign 000 005 000 007

R395W rs13306575 T (W) Possibly damaging 000 000 000 007

N419I rs35012521 T (I) Probably damaging 000 002 004 000

P454S rs55761650 T (S) mdash 000 000 000 002

L487S Chr1181795862 NCBI36hg18 C (S) mdash 002 000 000 000

NCF4

T85N rs112306225 A (N) mdash 000 000 002 000

A304E rs5995361 A (E) Benign 004 000 000 000

2162

Tarazona-Santos et al doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

that may have contributed to shape the pattern of NCF2diversity in Asians 1) the observed trend is suggestive of aselective sweep on NCF2 standing variation a neutral orweakly deleterious existing variant becomes beneficial andrapidly increases in frequency (together with its associatedhaplotypes ie incomplete sweep) reducing the nucleotidediversity in the surrounding region and rendering other stand-ing substitutions rare During this process new rare substitu-tions appear in the expanding positively selected haplotype2) The pattern of diversity of NCF2 in Asia may result from anincomplete selective sweep acting on ARPC5 which is locatedapproximately 35 kb downstream of NCF2 In a genome-widescan for recent positive selection Voight et al (2006) identi-fied a strong signature of an incomplete sweep for ARPC5 inAsia (P = 0009) characterized by a higher than expected long-range linkage disequilibrium summarized by very high iHS

statistics (see Haplotter results for the HapMap II data athttphaplotteruchicagoedu last accessed July 16 2013)SNPs in NCF2 also presents high iHS statistics (P = 002) al-though values are lower than for ARPC5 3) The differentiatedpattern of diversity of NCF2 in Asia may also have been gen-erated without the action of natural selection during the firstcolonization of Asia by modern humans In a process of geo-graphic population expansion specifically in the front wave ofthe expansion some rare alleleshaplotypes (ie surfing al-leles) may become common by chance mimicking the pat-tern of diversity generated by a selective sweep (Excoffier andRay 2008) 4) The excess of rare variants may be an artifact ofpooling individuals from different populations (Ptak andPrzeworski 2002) Consistent with evolutionary scenarios 1ndash4 that produce similar patterns of diversity the haplotypenetwork of NCF2 for Eurasians (fig 4b) shows the following1) a large differentiation between Asians and Europeans thatis compatible with the high observed FST values and 2) a star-like shape associated with the haplotype NCF2-E10 that iscommon in Asia and rare elsewhere which is compatiblewith the excess of rare alleles in the Asian populations

DiscussionBy analyzing 29 mammalian genomes and four human pop-ulations we show in this study that natural selection hasacted in different ways over time to shape the pattern ofdiversity of the phagocyte NADPH oxidase genes At the tem-poral scale of the evolution of mammals we have inferredrecurrent episodes of positive selection acting on the extra-cellular portion of gp-91 that have been important to shapethe pattern of interspecific diversity of this gene Our in-terspecific analyses did not show a similar pattern of naturalselection in any of the other phagocyte NADPH oxidasegenes Even if current knowledge on the biology of NADPHdoes not allow us to interpret our results in terms of functionwe propose that the extracellular region of gp-91 is function-ally relevant Our results also imply that this region is highlydifferentiated among mammals at the protein level and thisvariability should be considered when mammals models areused to study the structure and function of phagocyteNADPH components

In the time scale of human evolution our analyses of theNADPH oxidase genes suggest that CYBA has been a target ofbalancing natural selection Because we do not have evidenceof population-specific variants that faced selective pressurethe inferred natural selection may have acted on a standingvariation in ancestral populations This implies that the selec-tive pressure began after the appearance of the variant andpossibly acted in a specific geographic region (Barret andSchluter 2008) The signatures of natural selection acting ona new mutation and on standing variation differ In the case ofselective sweeps episodes of natural selection on standingvariation are associated to a larger variance in the allelic spec-trum with respect to natural selection on a new mutationAlso selection on standing variation may produce an excess ofalleles at intermediate frequencies that is not associated withhigh nucleotide diversity (Przeworski et al 2005 Peter et al2012) This pattern contrasts with the effect of balancing

Table 2 Intrapopulation Diversity Indexes in the Studied Populationsfor the NADPH Oxidase Genes Obtained from Resequencing Dataa

African European Asian Hispanic

Number of chromosomes

CYBBb 42 52 34 36

CYBA 48 62 48 46

NCF2 48 62 48 46

NCF4 48 62 48 46

Segregating sitessingletons

CYBB 218 70 103 135

CYBA 6122 333 335 347

NCF2 4613 3311 2816 3714

NCF4 4512 267 191 308

Haplotype structureNumber of inferred haplotypesc

CYBB 14 5 7 12

CYBA 39 39 32 26

NCF2 38 32 18 31

NCF4 36 30 17 22

Haplotype diversity plusmn SD

CYBB 088 plusmn 003 034 plusmn 008 053 plusmn 010 070 plusmn 008

CYBA 098 plusmn 001 098 plusmn 001 097 plusmn 000 096 plusmn 002

NCF2 099 plusmn 001 096 plusmn 001 087 plusmn 004 097 plusmn 001

NCF4 098 plusmn 001 093 plusmn 002 093 plusmn 002 093 plusmn 002

Recombination parameter (q 103 per site)c

CYBB 008 lt001 lt001 lt002

CYBA 291 148 096 088

NCF2 052 038 010 058

NCF4 137 103 039 022

h estimatorsd

p 103 per site

CYBB 036 012 015 028

CYBA 190 163 151 157

NCF2 081 047 043 057

NCF4 101 076 064 084

hW 103 per site

CYBB 042 013 021 027

CYBA 231 118 125 13

NCF2 102 069 062 083

NCF4 101 070 054 087

aMost analyses were performed using software DnaSP (Rozas 2009)bData for CYBB (Xp211) are from Tarazona-Santos et al (2008)cHaplotypes and inferred using the method by Stephens and Scheet (2005) andthe software PHASEd Tajima (1983) W Watterson (1975)

2163

Evolutionary Dynamics of Human NADPH Oxidase Genes doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

natural selection which produces an excess of common allelesassociated with high genetic diversity Thus the observed pat-tern of CYBA diversity in Europeans is not consistent with aselective sweep on a standing variation but it is consistent witha scenario of balancing selection acting on standing variation

If we consider for CYBA that heterozygote advantage maybe the mechanisms of balancing selection we can speculatethat the biological basis for this mechanism may be the fol-lowing considering that p22-phox is not exclusive of thephagocyte NADPH oxidase but it is also part of Nox com-plexes expressed in other tissues the dependence of ROSproduction on CYBA variants has to be finely regulated IfCYBA variants induce high levels of ROS these variants mayfavor a phagocyte-dependent efficient response to pathogensbut may damage other endothelial tissues Alternativelytissue oxidative damage does not occur if ROS productionis low but this response may be associated with a weaker

phagocyte respiratory burst against pathogens In this con-text heterozygote individuals with a CYBA-dependent inter-mediate level of ROS production may have been favored bynatural selection

Our results contribute to the discussion regarding the rel-evance of balancing selection in shaping the diversity ofinnate immunity genes (Ferrer-Admetlla et al 2008 Barreiroet al 2009) Ferrer-Admetlla et al (2008) have associated therecurrent signatures of balancing selection on inflammatorygenes with the need for fine regulation CYBA is the onlyphagocytic NADPH oxidase gene that also encodesnonphagocyte Nox components thus CYBA has a role inROS cell signaling a potentially dangerous process due toits capability to produce oxidative damage to tissues Geneswith these characteristics likely need even tighter regulationThe interplay between pathogen-driven selective pressureon innate immunity genes and their concomitantnonimmunological functions is complex In addition toCYBA other interesting examples of this interplay can befound among the 10 human toll-like receptors (TLRs) thatshow a variety of signatures of natural selection TLR8 whichshows the strongest signature of purifying selection is alsoinvolved in neuronal development (Barreiro et al 2009) and itis difficult to discriminate the role of each function in deter-mining the observed signature of natural selection

With few exceptions the pathogens responsible for naturalselection on immune genes are difficult to specify In the caseof NADPH oxidase we can infer based on the spectrum ofinfections in CGD patients that catalase-positive bacteria andfungi such as Staphylococci Salmonella Candida Aspergillusand M tuberculosis may be the selective pathogensInteractions between the host and pathogens also includemechanisms of the latter to impair the respiratory burst ofthe former For example Leishmania donovani blocks the as-sembly of NADPH oxidase at the phagosome membrane(Lodge et al 2006) These mechanisms may constitute selec-tive pressures imposed by pathogens

The associations reported in GWAS between rs4821544 inNCF4 and Crohnrsquos disease (an idiopathic inflammatory boweldisease that predominantly involves the ileum and colonRioux et al 2007) and between rs10911363 in NCF2 and

Table 3 Pairwise FST Genetic Distances between Populations

CYBBa CYBA

Africa Europe Asia Hispanic Africa Europe Asia Hispanic

Africa mdash 0316 0264 0092 mdash 0074 0083 0065

Europe 0257 mdash 0000 0107 mdash 0002 0056

Asia 0211 0000 mdash 0073 mdash 0054

Hispanic 0070 0082 0056 mdash mdash

NCF2 NCF4

Africa mdash 0048 0058 0037 mdash 0128 0136 0158

Europe mdash 0069 0000 mdash 0026 0026

Asia mdash 0059 mdash 0005

Hispanic mdash mdash

aFor CYBB FST estimators are above the diagonal Below the diagonal are the FST values corrected as if the effective population sizes of X chromosome genes were equal toautosomal ones

Table 4 Results of Neutrality Tests for the NADPH Oxidase Genesand Their Significancea

African European Asian Hispanic

Tajimarsquos D

CYBB 0473 0274 0813 0084

CYBA 0580 1242 0684 0707

NCF2 0412 0939 0883 0987

NCF4 0750 0243 0556 0102

Fu and Lirsquos D

CYBB 1050 1110 0395 0977

CYBA 1485 1734 1188 0138

NCF2 0407 1105 1904 1398

NCF4 0308 0443 1893 0266

Fu and Lirsquos F

CYBB 0980 0811 0114 0814

CYBA 1382 1893 1205 0405

NCF2 0512 1252 1893 1567

NCF4 0557 0231 1225 0248

NOTEmdashUnderlined values represent significant results under the demographic modelinferred by Laval et al (2010) for human populations See details in supplementarytable S4 Supplementary Material onlineaThe McDonaldndashKreitman test is nonsignificant in any of the cases

Significant under the WrightndashFisher model of constant population size

2164

Tarazona-Santos et al doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

systemic lupus erythematosus (Cunninghame Graham et al2011) confirm the involvement of NADPH genes in the path-ogenesis of inflammatory-related common diseases Ourclaim that natural selection acted on CYBA (and maybe inNCF2) is relevant for biomedical studies because combiningevidence of natural selection with association analyses inimmune genes increases the statistical power to detect dis-ease-associated variants (Ayodo et al 2007) As a new gener-ation of association studies focusing on rare variation isemerging the combination of genes deemed interestingfrom GWAS and populations with an excess of rare variantsin these genes such as NCF2 in Asians are particularly inter-esting as a source of rare variants with clinical relevanceFinally by determining through molecular evolution mappingthat the extracellular portion of gp91 (encoded by CYBB inthe X-chromosome) has been subject to recurrent episodes ofpositive selection at the scale of mammals evolution we positthe hypothesis that this portion of the NADPH oxidase isrelevant for currently unknown biological processes thatonce revealed by structural and functional investigationswill contribute to understanding the role of NADPH oxidasein infectious autoimmune and cardiovascular diseases

Materials and Methods

Molecular Evolution Analysis of NADPH Genes

We used the maximum likelihood framework developed byYang (2007a) to estimate for the NADPH oxidase genes

under a variety of evolutionary models as implemented in thePAML software (Yang 2007b) This approach allows infer-ences about the evolution of a coding region along aninter-specific phylogeny mapping the codons that haveevolved under strongweak purifying selection neutrality oradaptive positive selection Further details about these anal-yses are available as supplementary material SupplementaryMaterial online

Human Population Genetics of NADPH Genes

For human population genetics analyses we conducted bidi-rectional Sanger sequencing of CYBB CYBA NCF2 and NCF4for a total of 35242 bp for each of 102 healthy individuals aspart of the SNP500 Cancer project (Packer et al 2006 seesupplementary fig S1 Supplementary Material online for de-tails) Human population resequencing data for CYBB werepublished in Tarazona-Santos et al (2008) The resequencingexperiments were performed as in Packer et al (2006) These102 unrelated individuals include the following 24 of Africanancestry (15 African Americans from the United States and 9Pygmies) 23 admixed Latin Americans (from Mexico PuertoRico and South America) 31 Europeans (from the CEPHUTAH pedigree and the NIEHS Environmental GenomeProject) and 24 AsiansndashOceanians (from Melanesia PakistanChina Cambodia Japan and Taiwan)

After controlling for multiple tests we confirmed that allSNPs fit the HardyndashWeinberg equilibrium by the Guo and

CYBA - A16

CYBA - A3

Y72H - rs4673

V174A - rs1049254 CYBA - E18

(a)

NCF2 - E10 H389Q - rs17849502

NCF2 - B3

NCF2 - D11

NCF2 - C1K181R - rs2274064(b)

FIG 4 Phylogenetic networks of (a) CYBA in Europeans and (b) NCF2 in Europeans (black) and Asians (yellow) The lengths of the branches areproportional to the number of mutations Only nonsynonymous mutations are shown The haplotype names correspond to the table of inferredhaplotypes in the supplementary material Supplementary Material online We only show the names of haplotypes with a frequency gt5 Ancestralhaplotypes (inferred as being the human haplotype or median vector most similar to the chimpanzee sequence) are indicated by an arrow Medianvectors are in red

2165

Evolutionary Dynamics of Human NADPH Oxidase Genes doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

Thompson (1992) test which was implemented in the soft-ware Arlequin 30 (Excoffier et al 2007) Insertion-deletions(INDELs) were excluded from population genetics analysesTo assess intrapopulation diversity we used two statistics the per-site mean number of pairwise differences betweensequences (Tajima 1983) and w based on the number ofsegregating sites (S) (Watterson 1975) We measured pairwisebetween-populations diversity by using the FST statistics cal-culated using the software DnaSP (Rozas 2009)

Haplotypes and the recombination parameter were in-ferred using the PHASE software (Stephens and Scheet 2005)and diversity indexes calculations (tables 2 and 3) as well asneutrality statistics (table 4) were estimated using DnaSP soft-ware We applied two kinds of neutrality tests 1) tests basedon the allelic spectrum which is the distribution of polymor-phisms across different classes of frequencies namelyTajimarsquos DT (Tajima 1989) and FundashLirsquos DFL and FFL (Fu andLi 1993) and 2) tests based on comparisons between thenumber of polymorphisms in human populations and fixeddifferences with the chimpanzee (ie outgroup) namely theMcDonald and Kreitman (1991) test and the adaptedKolmogorovndashSmirnoff test by McDonald (1998) For thefirst set of tests we used as null hypotheses both the classicWrightndashFisher model of neutrality with a constant popula-tion size as well as the more realistic evolutionary scenario forhuman populations inferred by Laval et al (2010) In the caseof the scenario of Laval et al (2010) we ignored interconti-nental gene flow within the Old World because these raregene flow events likely does not affect the level of significanceof the neutrality tests given its very low inferred values(13 105) Null distributions used to test the significanceof the neutrality tests under these evolutionary scenarios weregenerated using coalescent simulations and a significancelevel of 005 (Hudson 2002) The KolmogorovndashSmirnoff testof neutrality adapted by McDonald (1998) was performedusing Slider software available at httpudeledu~mcdonaldaboutdnasliderhtml (last accessed July 16 2013) Weperformed coalescent simulations using ms software(Hudson 2002) Further methodological details are availableas supplementary material Supplementary Material onlineWe constructed the CYBA and NCF2 networks using allSNP variants and applying the Median joining algorithmand the maximum parsimony option calculations as imple-mented in the software Network 46 (Bandelt et al 1999)

Supplementary MaterialSupplementary tables S1ndashS5 and figure S1 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

SJC conceived the study ET-S MM FL RC AC CF LPand BS generated the dataanalyzed the sequences LB ACWCSM and MY provided tools for data generationand analyses ET-S MM FL WCSM and RR performedpopulation genetics analyses ET-S and MM prepared tablesand figures ET-S and SJC wrote the manuscript All the

authors contributed with discussions about the results Theauthors are grateful to Silvia Fuselli for discussions of ourresults to Fernando Levi Soares for technical assistance withthe figures and the Sequencing Group of the CoreGenotyping Facility (National Cancer Institute) for their tech-nical assistance This work was supported by the IntramuralResearch Program of the National Cancer Institute (NCI) andNational Institutes of Health (NIH) CF was supported by theUniversity of Bologna ET-S MM WCSM and FL weresupported by the Fogarty International Center and NationalCancer Institute (1R01TW007894) Conselho Nacional deDesenvolvimento Cientıfico e Tecnologico (Brazil Grant3075562012-3) Fundacao de Amparo a Pesquisa de MinasGerais (Brazil CBB PPM 0026512) and the Brazilian Ministryof Education (CAPES Agency grant PNPD 0256709-1)

ReferencesAyodo G Price AL Keinan A Ajwang A Otieno MF Orago AS

Patterson N Reich D 2007 Combining evidence of natural selectionwith association analysis increases power to detect malaria-resis-tance variants Am J Hum Genet 81(2)234ndash242

Bamshad M Wooding SP 2003 Signatures of natural selection in thehuman genome Nat Rev Genet 4(2)99ndash111

Bandelt H-J Forster P Rohl A 1999 Median-joining networks for infer-ring intraspecific phylogenies Mol Biol Evol 16(1)37ndash48

Barreiro LB Ben-Ali M Quach H et al (17 co-authors) 2009Evolutionary dynamics of human Toll-like receptors and their dif-ferent contributions to host defense PLoS Genet 5(7)e1000562

Barreiro LB Quintana-Murci L 2010 From evolutionary genetics tohuman immunology how selection shapes host defence genesNat Rev Genet 11(1)17ndash30

Barrett RD Schluter D 2008 Adaptation from standing genetic varia-tion Trends Ecol Evol 23(1)38ndash44

Bedard K Attar H Bonnefont J Jaquet V Borel C Plastre O Stasia MJAntonarakis SE Krause KH 2009 Three common polymorphisms inthe CYBA gene form a haplotype associated with decreased ROSgeneration Hum Mutat 30(7)1123ndash1133

Blekhman R Man O Herrmann L Boyko AR Indap A Kosiol CBustamante CD Teshima KM Przeworski M 2008 Natural selectionon genes that underlie human disease susceptibility Curr Biol18(12)883ndash889

Brandes RP Kreuzer J 2005 Vascular NADPH oxidases molecular mech-anisms of activation Cardiovasc Res 65(1)16ndash27

Buckley RH 2004 Pulmonary complications of primary immunodefi-ciencies Paediatr Respir Rev 5(Suppl A)S225ndashS233

Bustamante CD Fledel-Alon A Williamson S et al (13 co-authors)2005 Natural selection on protein-coding genes in the humangenome Nature 437(7062)1153ndash1157

Bustamante J Arias AA Vogt G et al (29 co-authors) 2011 GermlineCYBB mutations that selectively affect macrophages in kindredswith X-linked predisposition to tuberculous mycobacterial diseaseNat Immunol 12(3)213ndash221

Campbell MC Tishkoff SA 2008 African genetic diversity implicationsfor human demographic history modern human origins and com-plex disease mapping Annu Rev Genomics Hum Genet 9403ndash433

Chanock SJ Roesler J Zhan S Hopkins P Lee P Barrett DT ChristensenBL Curnutte JT Gorlach A 2000 Genomic structure of the humanp47-phox (NCF1) gene Blood Cells Mol Dis 26(1)37ndash46

Cunninghame Graham DS Morris DL Bhangale TR Criswell LASyvanen AC Ronnblom L Behrens TW Graham RR Vyse TJ2011 Association of NCF2 IKZF1 IRF8 IFIH1 and TYK2 with sys-temic lupus erythematosus PLoS Genet 7(10)e1002341

Excoffier L Laval G Schneider S 2007 Arlequin ver 30 an integratedsoftware package for population genetics data analysis EvolBioinform Online 147ndash50

2166

Tarazona-Santos et al doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

Excoffier L Ray N 2008 Surfing during population expansions promotesgenetic revolutions and structuration Trends Ecol Evol23(7)347ndash351

Ferrer-Admetlla A Bosch E Sikora M Marques-Bonet T Ramırez-SorianoA Muntasell A Navarro A Lazarus R Calafell F Bertranpetit J Casals F2008 Balancing selection is the main force shaping the evolution ofinnate immunity genes J Immunol 181(2)1315ndash1322

Fu YX Li WH 1993 Statistical tests of neutrality of mutations Genetics133(3)693ndash709

The 1000 Genomes Project Consortium 2010 A map of humangenome variation from population-scale sequencing Nature467(7319)1061ndash1073

The 1000 Genomes Project Consortium Abecasis GR Auton A BrooksLD DePristo MA Durbin RM Handsaker RE Kang HM Marth GTMcVean GA 2012 An integrated map of genetic variation from1092 human genomes Nature 491(7422)56ndash65

Guo SW Thompson EA 1992 Performing the exact test of Hardy-Weinberg proportion for multiple alleles Biometrics 48(2)361ndash372

Heyworth PG Cross AR Curnutte JT 2003 Chronic granulomatousdisease Curr Opin Immunol 15(5)578ndash584

Hudson RR 2002 Generating samples under a Wright-Fisher neutralmodel of genetic variation Bioinformatics 18(2)337ndash338

Jacob CO Eisenstein M Dinauer MC et al (21 co-authors) 2012 Lupus-associated causal mutation in neutrophil cytosolic factor 2 (NCF2)brings unique insights to the structure and function of NADPHoxidase Proc Natl Acad Sci U S A 109(2)E59ndashE67

Kimura M 1974 The neutral theory of molecular evolution CambridgeCambridge University Press

Kosiol C Vinar T da Fonseca RR Hubisz MJ Bustamante CD Nielsen RSiepel A 2008 Patterns of positive selection in six mammalian ge-nomes PLoS Genet 4(8)e1000144

Lacy F Kailasam MT OrsquoConnor DT Schmid-Schonbein GW Parmer RJ2000 Plasma hydrogen peroxide production in human essentialhypertension role of heredity gender and ethnicity Hypertension36(5)878ndash884

Laval G Patin E Barreiro LB Quintana-Murci L 2010 Formulating a his-torical and demographic model of recent human evolution based onresequencing data from noncoding regions PLoS One 5(4)e10284

Lewontin R 1972 The apportionment of human diversity Evol Biol 6391ndash398

Lindblad-Toh K Garber M Zuk O et al (88 co-authors) 2011 A high-resolution map of human evolutionary constraint using 29 mam-mals Nature 478(7370)476ndash482

Lodge R Diallo TO Descoteaux A 2006 Leishmania donovani lipopho-sphoglycan blocks NADPH oxidase assembly at the phagosomemembrane Cell Microbiol 8(12)1922ndash1931

Magalhaes WC Rodrigues MR Silva D Soares-Souza G Iannini MLCerqueira GC Faria-Campos AC Tarazona-Santos E 2012DIVERGENOME a bioinformatics platform to assist population ge-netics and genetic epidemiology studies Genet Epidemiol36(4)360ndash367

Marth JD Grewal PK 2008 Mammalian glycosylation in immunity NatRev Immunol 8(11)874ndash887

McDonald JH 1998 Improved tests for heterogeneity across a region ofDNA sequence in the ratio of polymorphism to divergence Mol BiolEvol 15377ndash384

McDonald JH Kreitman M 1991 Adaptive protein evolution at the Adhlocus in Drosophila Nature 351(6328)652ndash654

Nielsen R Bustamante C Clark AG et al (12 co-authors) 2005 A scanfor positively selected genes in the genomes of humans and chim-panzees PLoS Biol 3(6)e170

Olsson LM Lindqvist AK Kallberg H Padyukov L Burkhardt HAlfredsson L Klareskog L Holmdahl R 2007 A case-control studyof rheumatoid arthritis identifies an associated single nucleotidepolymorphism in the NCF4 gene supporting a role for the

NADPH-oxidase complex in autoimmunity Arthritis Res Ther9(5)R98

Packer BR Yeager M Burdett L et al (13 co-authors) 2006SNP500Cancer a public resource for sequence validation assay de-velopment and frequency analysis for genetic variation in candidategenes Nucleic Acids Res 34(Database issue)D617ndashD621

Peter BM Huerta-Sanchez E Nielsen R 2012 Distinguishing betweenselective sweeps from standing variation and from a de novo mu-tation PLoS Genet 8(10)e1003011

Przeworski M Coop G Wall JD 2005 The signature of positive selectionon standing genetic variation Evolution 59(11)2312ndash2323

Ptak SE Przeworski M 2002 Evidence for population growth in humansis confounded by fine-scale population structure Trends Genet18(11)559ndash563

Ramensky V Bork P Sunyaev S 2002 Human non-synonymous SNPsserver and survey Nucleic Acids Res 30(17)3894ndash3900

Rioux JD Xavier RJ Taylor KD et al (24 co-authors) 2007 Genome-wideassociation study identifies new susceptibility loci for Crohn diseaseand implicates autophagy in disease pathogenesis Nat Genet39(5)596ndash604

Roberts RL Hollis-Moffatt JE Gearry RB Kennedy MA Barclay MLMerriman TR 2008 Confirmation of association of IRGM andNCF4 with ileal Crohnrsquos disease in a population-based cohortGenes Immun 9(6)561ndash565

Rozas J 2009 DNA sequence polymorphism analysis using DnaSPMethods Mol Biol 537337ndash350

San Jose G Fortuno A Beloqui O Dıez J Zalba G 2008 NADPH oxidaseCYBA polymorphisms oxidative stress and cardiovascular diseasesClin Sci (Lond) 114(3)173ndash182

Santiago HC Gonzalez Lombana CZ Macedo JP et al (12 co-authors)2012 NADPH phagocyte oxidase knockout mice controlTrypanosoma cruzi proliferation but develop circulatory collapseand succumb to infection PLoS Negl Trop Dis 6(2)e1492

Stephens M Scheet P 2005 Accounting for decay of linkage disequilib-rium in haplotype inference and missing-data imputation Am JHum Genet 76(3)449ndash462

Sumimoto H Miyano K Takeya R 2005 Molecular composition andregulation of the Nox family NAD(P)H oxidases Biochem BiophysRes Commun 338(1)677ndash686

Tajima F 1983 Evolutionary relationship of DNA sequences in finitepopulations Genetics 105(2)437ndash460

Tajima F 1989 Statistical method for testing the neutral mutationhypothesis by DNA polymorphism Genetics 123(3)585ndash595

Tarazona-Santos E Bernig T Burdett L Magalhaes WC Fabbri C Liao JRedondo RA Welch R Yeager M Chanock SJ 2008 CYBB anNADPH-oxidase gene restricted diversity in humans and evidencefor differential long-term purifying selection on transmembrane andcytosolic domains Hum Mutat 29(5)623ndash632

Taylor RM Burritt JB Baniulis D Foubert TR Lord CI Dinauer MCParkos CA Jesaitis AJ 2004 Site-specific inhibitors of NADPH oxi-dase activity and structural probes of flavocytochrome b character-ization of six monoclonal antibodies to the p22phox subunitJ Immunol 173(12)7349ndash7357

Voight BF Kudaravalli S Wen X Pritchard JK 2006 A map of recentpositive selection in the human genome PLoS Biol 4(3)e72

Watterson GA 1975 On the number of segregating sites in geneticalmodels without recombination Theor Popul Biol 7(2) 256ndash276

Williamson SH Hernandez R Fledel-Alon A Zhu L Nielsen RBustamante CD 2005 Simultaneous inference of selection and pop-ulation growth from patterns of variation in the human genomeProc Natl Acad Sci U S A 102(22)7882ndash7887

Yang Z 2007a Adaptive molecular evolution In Balding DJ Bishop MCannings C editors Handbook of statistical genetics Vol 1 3rd edSusex (United Kingdom) John Wiley amp Sons p 377ndash406

Yang Z 2007b PAML 4 phylogenetic analysis by maximum likelihoodMol Biol Evol 241586ndash1591

2167

Evolutionary Dynamics of Human NADPH Oxidase Genes doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

that may have contributed to shape the pattern of NCF2diversity in Asians 1) the observed trend is suggestive of aselective sweep on NCF2 standing variation a neutral orweakly deleterious existing variant becomes beneficial andrapidly increases in frequency (together with its associatedhaplotypes ie incomplete sweep) reducing the nucleotidediversity in the surrounding region and rendering other stand-ing substitutions rare During this process new rare substitu-tions appear in the expanding positively selected haplotype2) The pattern of diversity of NCF2 in Asia may result from anincomplete selective sweep acting on ARPC5 which is locatedapproximately 35 kb downstream of NCF2 In a genome-widescan for recent positive selection Voight et al (2006) identi-fied a strong signature of an incomplete sweep for ARPC5 inAsia (P = 0009) characterized by a higher than expected long-range linkage disequilibrium summarized by very high iHS

statistics (see Haplotter results for the HapMap II data athttphaplotteruchicagoedu last accessed July 16 2013)SNPs in NCF2 also presents high iHS statistics (P = 002) al-though values are lower than for ARPC5 3) The differentiatedpattern of diversity of NCF2 in Asia may also have been gen-erated without the action of natural selection during the firstcolonization of Asia by modern humans In a process of geo-graphic population expansion specifically in the front wave ofthe expansion some rare alleleshaplotypes (ie surfing al-leles) may become common by chance mimicking the pat-tern of diversity generated by a selective sweep (Excoffier andRay 2008) 4) The excess of rare variants may be an artifact ofpooling individuals from different populations (Ptak andPrzeworski 2002) Consistent with evolutionary scenarios 1ndash4 that produce similar patterns of diversity the haplotypenetwork of NCF2 for Eurasians (fig 4b) shows the following1) a large differentiation between Asians and Europeans thatis compatible with the high observed FST values and 2) a star-like shape associated with the haplotype NCF2-E10 that iscommon in Asia and rare elsewhere which is compatiblewith the excess of rare alleles in the Asian populations

DiscussionBy analyzing 29 mammalian genomes and four human pop-ulations we show in this study that natural selection hasacted in different ways over time to shape the pattern ofdiversity of the phagocyte NADPH oxidase genes At the tem-poral scale of the evolution of mammals we have inferredrecurrent episodes of positive selection acting on the extra-cellular portion of gp-91 that have been important to shapethe pattern of interspecific diversity of this gene Our in-terspecific analyses did not show a similar pattern of naturalselection in any of the other phagocyte NADPH oxidasegenes Even if current knowledge on the biology of NADPHdoes not allow us to interpret our results in terms of functionwe propose that the extracellular region of gp-91 is function-ally relevant Our results also imply that this region is highlydifferentiated among mammals at the protein level and thisvariability should be considered when mammals models areused to study the structure and function of phagocyteNADPH components

In the time scale of human evolution our analyses of theNADPH oxidase genes suggest that CYBA has been a target ofbalancing natural selection Because we do not have evidenceof population-specific variants that faced selective pressurethe inferred natural selection may have acted on a standingvariation in ancestral populations This implies that the selec-tive pressure began after the appearance of the variant andpossibly acted in a specific geographic region (Barret andSchluter 2008) The signatures of natural selection acting ona new mutation and on standing variation differ In the case ofselective sweeps episodes of natural selection on standingvariation are associated to a larger variance in the allelic spec-trum with respect to natural selection on a new mutationAlso selection on standing variation may produce an excess ofalleles at intermediate frequencies that is not associated withhigh nucleotide diversity (Przeworski et al 2005 Peter et al2012) This pattern contrasts with the effect of balancing

Table 2 Intrapopulation Diversity Indexes in the Studied Populationsfor the NADPH Oxidase Genes Obtained from Resequencing Dataa

African European Asian Hispanic

Number of chromosomes

CYBBb 42 52 34 36

CYBA 48 62 48 46

NCF2 48 62 48 46

NCF4 48 62 48 46

Segregating sitessingletons

CYBB 218 70 103 135

CYBA 6122 333 335 347

NCF2 4613 3311 2816 3714

NCF4 4512 267 191 308

Haplotype structureNumber of inferred haplotypesc

CYBB 14 5 7 12

CYBA 39 39 32 26

NCF2 38 32 18 31

NCF4 36 30 17 22

Haplotype diversity plusmn SD

CYBB 088 plusmn 003 034 plusmn 008 053 plusmn 010 070 plusmn 008

CYBA 098 plusmn 001 098 plusmn 001 097 plusmn 000 096 plusmn 002

NCF2 099 plusmn 001 096 plusmn 001 087 plusmn 004 097 plusmn 001

NCF4 098 plusmn 001 093 plusmn 002 093 plusmn 002 093 plusmn 002

Recombination parameter (q 103 per site)c

CYBB 008 lt001 lt001 lt002

CYBA 291 148 096 088

NCF2 052 038 010 058

NCF4 137 103 039 022

h estimatorsd

p 103 per site

CYBB 036 012 015 028

CYBA 190 163 151 157

NCF2 081 047 043 057

NCF4 101 076 064 084

hW 103 per site

CYBB 042 013 021 027

CYBA 231 118 125 13

NCF2 102 069 062 083

NCF4 101 070 054 087

aMost analyses were performed using software DnaSP (Rozas 2009)bData for CYBB (Xp211) are from Tarazona-Santos et al (2008)cHaplotypes and inferred using the method by Stephens and Scheet (2005) andthe software PHASEd Tajima (1983) W Watterson (1975)

2163

Evolutionary Dynamics of Human NADPH Oxidase Genes doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

natural selection which produces an excess of common allelesassociated with high genetic diversity Thus the observed pat-tern of CYBA diversity in Europeans is not consistent with aselective sweep on a standing variation but it is consistent witha scenario of balancing selection acting on standing variation

If we consider for CYBA that heterozygote advantage maybe the mechanisms of balancing selection we can speculatethat the biological basis for this mechanism may be the fol-lowing considering that p22-phox is not exclusive of thephagocyte NADPH oxidase but it is also part of Nox com-plexes expressed in other tissues the dependence of ROSproduction on CYBA variants has to be finely regulated IfCYBA variants induce high levels of ROS these variants mayfavor a phagocyte-dependent efficient response to pathogensbut may damage other endothelial tissues Alternativelytissue oxidative damage does not occur if ROS productionis low but this response may be associated with a weaker

phagocyte respiratory burst against pathogens In this con-text heterozygote individuals with a CYBA-dependent inter-mediate level of ROS production may have been favored bynatural selection

Our results contribute to the discussion regarding the rel-evance of balancing selection in shaping the diversity ofinnate immunity genes (Ferrer-Admetlla et al 2008 Barreiroet al 2009) Ferrer-Admetlla et al (2008) have associated therecurrent signatures of balancing selection on inflammatorygenes with the need for fine regulation CYBA is the onlyphagocytic NADPH oxidase gene that also encodesnonphagocyte Nox components thus CYBA has a role inROS cell signaling a potentially dangerous process due toits capability to produce oxidative damage to tissues Geneswith these characteristics likely need even tighter regulationThe interplay between pathogen-driven selective pressureon innate immunity genes and their concomitantnonimmunological functions is complex In addition toCYBA other interesting examples of this interplay can befound among the 10 human toll-like receptors (TLRs) thatshow a variety of signatures of natural selection TLR8 whichshows the strongest signature of purifying selection is alsoinvolved in neuronal development (Barreiro et al 2009) and itis difficult to discriminate the role of each function in deter-mining the observed signature of natural selection

With few exceptions the pathogens responsible for naturalselection on immune genes are difficult to specify In the caseof NADPH oxidase we can infer based on the spectrum ofinfections in CGD patients that catalase-positive bacteria andfungi such as Staphylococci Salmonella Candida Aspergillusand M tuberculosis may be the selective pathogensInteractions between the host and pathogens also includemechanisms of the latter to impair the respiratory burst ofthe former For example Leishmania donovani blocks the as-sembly of NADPH oxidase at the phagosome membrane(Lodge et al 2006) These mechanisms may constitute selec-tive pressures imposed by pathogens

The associations reported in GWAS between rs4821544 inNCF4 and Crohnrsquos disease (an idiopathic inflammatory boweldisease that predominantly involves the ileum and colonRioux et al 2007) and between rs10911363 in NCF2 and

Table 3 Pairwise FST Genetic Distances between Populations

CYBBa CYBA

Africa Europe Asia Hispanic Africa Europe Asia Hispanic

Africa mdash 0316 0264 0092 mdash 0074 0083 0065

Europe 0257 mdash 0000 0107 mdash 0002 0056

Asia 0211 0000 mdash 0073 mdash 0054

Hispanic 0070 0082 0056 mdash mdash

NCF2 NCF4

Africa mdash 0048 0058 0037 mdash 0128 0136 0158

Europe mdash 0069 0000 mdash 0026 0026

Asia mdash 0059 mdash 0005

Hispanic mdash mdash

aFor CYBB FST estimators are above the diagonal Below the diagonal are the FST values corrected as if the effective population sizes of X chromosome genes were equal toautosomal ones

Table 4 Results of Neutrality Tests for the NADPH Oxidase Genesand Their Significancea

African European Asian Hispanic

Tajimarsquos D

CYBB 0473 0274 0813 0084

CYBA 0580 1242 0684 0707

NCF2 0412 0939 0883 0987

NCF4 0750 0243 0556 0102

Fu and Lirsquos D

CYBB 1050 1110 0395 0977

CYBA 1485 1734 1188 0138

NCF2 0407 1105 1904 1398

NCF4 0308 0443 1893 0266

Fu and Lirsquos F

CYBB 0980 0811 0114 0814

CYBA 1382 1893 1205 0405

NCF2 0512 1252 1893 1567

NCF4 0557 0231 1225 0248

NOTEmdashUnderlined values represent significant results under the demographic modelinferred by Laval et al (2010) for human populations See details in supplementarytable S4 Supplementary Material onlineaThe McDonaldndashKreitman test is nonsignificant in any of the cases

Significant under the WrightndashFisher model of constant population size

2164

Tarazona-Santos et al doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

systemic lupus erythematosus (Cunninghame Graham et al2011) confirm the involvement of NADPH genes in the path-ogenesis of inflammatory-related common diseases Ourclaim that natural selection acted on CYBA (and maybe inNCF2) is relevant for biomedical studies because combiningevidence of natural selection with association analyses inimmune genes increases the statistical power to detect dis-ease-associated variants (Ayodo et al 2007) As a new gener-ation of association studies focusing on rare variation isemerging the combination of genes deemed interestingfrom GWAS and populations with an excess of rare variantsin these genes such as NCF2 in Asians are particularly inter-esting as a source of rare variants with clinical relevanceFinally by determining through molecular evolution mappingthat the extracellular portion of gp91 (encoded by CYBB inthe X-chromosome) has been subject to recurrent episodes ofpositive selection at the scale of mammals evolution we positthe hypothesis that this portion of the NADPH oxidase isrelevant for currently unknown biological processes thatonce revealed by structural and functional investigationswill contribute to understanding the role of NADPH oxidasein infectious autoimmune and cardiovascular diseases

Materials and Methods

Molecular Evolution Analysis of NADPH Genes

We used the maximum likelihood framework developed byYang (2007a) to estimate for the NADPH oxidase genes

under a variety of evolutionary models as implemented in thePAML software (Yang 2007b) This approach allows infer-ences about the evolution of a coding region along aninter-specific phylogeny mapping the codons that haveevolved under strongweak purifying selection neutrality oradaptive positive selection Further details about these anal-yses are available as supplementary material SupplementaryMaterial online

Human Population Genetics of NADPH Genes

For human population genetics analyses we conducted bidi-rectional Sanger sequencing of CYBB CYBA NCF2 and NCF4for a total of 35242 bp for each of 102 healthy individuals aspart of the SNP500 Cancer project (Packer et al 2006 seesupplementary fig S1 Supplementary Material online for de-tails) Human population resequencing data for CYBB werepublished in Tarazona-Santos et al (2008) The resequencingexperiments were performed as in Packer et al (2006) These102 unrelated individuals include the following 24 of Africanancestry (15 African Americans from the United States and 9Pygmies) 23 admixed Latin Americans (from Mexico PuertoRico and South America) 31 Europeans (from the CEPHUTAH pedigree and the NIEHS Environmental GenomeProject) and 24 AsiansndashOceanians (from Melanesia PakistanChina Cambodia Japan and Taiwan)

After controlling for multiple tests we confirmed that allSNPs fit the HardyndashWeinberg equilibrium by the Guo and

CYBA - A16

CYBA - A3

Y72H - rs4673

V174A - rs1049254 CYBA - E18

(a)

NCF2 - E10 H389Q - rs17849502

NCF2 - B3

NCF2 - D11

NCF2 - C1K181R - rs2274064(b)

FIG 4 Phylogenetic networks of (a) CYBA in Europeans and (b) NCF2 in Europeans (black) and Asians (yellow) The lengths of the branches areproportional to the number of mutations Only nonsynonymous mutations are shown The haplotype names correspond to the table of inferredhaplotypes in the supplementary material Supplementary Material online We only show the names of haplotypes with a frequency gt5 Ancestralhaplotypes (inferred as being the human haplotype or median vector most similar to the chimpanzee sequence) are indicated by an arrow Medianvectors are in red

2165

Evolutionary Dynamics of Human NADPH Oxidase Genes doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

Thompson (1992) test which was implemented in the soft-ware Arlequin 30 (Excoffier et al 2007) Insertion-deletions(INDELs) were excluded from population genetics analysesTo assess intrapopulation diversity we used two statistics the per-site mean number of pairwise differences betweensequences (Tajima 1983) and w based on the number ofsegregating sites (S) (Watterson 1975) We measured pairwisebetween-populations diversity by using the FST statistics cal-culated using the software DnaSP (Rozas 2009)

Haplotypes and the recombination parameter were in-ferred using the PHASE software (Stephens and Scheet 2005)and diversity indexes calculations (tables 2 and 3) as well asneutrality statistics (table 4) were estimated using DnaSP soft-ware We applied two kinds of neutrality tests 1) tests basedon the allelic spectrum which is the distribution of polymor-phisms across different classes of frequencies namelyTajimarsquos DT (Tajima 1989) and FundashLirsquos DFL and FFL (Fu andLi 1993) and 2) tests based on comparisons between thenumber of polymorphisms in human populations and fixeddifferences with the chimpanzee (ie outgroup) namely theMcDonald and Kreitman (1991) test and the adaptedKolmogorovndashSmirnoff test by McDonald (1998) For thefirst set of tests we used as null hypotheses both the classicWrightndashFisher model of neutrality with a constant popula-tion size as well as the more realistic evolutionary scenario forhuman populations inferred by Laval et al (2010) In the caseof the scenario of Laval et al (2010) we ignored interconti-nental gene flow within the Old World because these raregene flow events likely does not affect the level of significanceof the neutrality tests given its very low inferred values(13 105) Null distributions used to test the significanceof the neutrality tests under these evolutionary scenarios weregenerated using coalescent simulations and a significancelevel of 005 (Hudson 2002) The KolmogorovndashSmirnoff testof neutrality adapted by McDonald (1998) was performedusing Slider software available at httpudeledu~mcdonaldaboutdnasliderhtml (last accessed July 16 2013) Weperformed coalescent simulations using ms software(Hudson 2002) Further methodological details are availableas supplementary material Supplementary Material onlineWe constructed the CYBA and NCF2 networks using allSNP variants and applying the Median joining algorithmand the maximum parsimony option calculations as imple-mented in the software Network 46 (Bandelt et al 1999)

Supplementary MaterialSupplementary tables S1ndashS5 and figure S1 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

SJC conceived the study ET-S MM FL RC AC CF LPand BS generated the dataanalyzed the sequences LB ACWCSM and MY provided tools for data generationand analyses ET-S MM FL WCSM and RR performedpopulation genetics analyses ET-S and MM prepared tablesand figures ET-S and SJC wrote the manuscript All the

authors contributed with discussions about the results Theauthors are grateful to Silvia Fuselli for discussions of ourresults to Fernando Levi Soares for technical assistance withthe figures and the Sequencing Group of the CoreGenotyping Facility (National Cancer Institute) for their tech-nical assistance This work was supported by the IntramuralResearch Program of the National Cancer Institute (NCI) andNational Institutes of Health (NIH) CF was supported by theUniversity of Bologna ET-S MM WCSM and FL weresupported by the Fogarty International Center and NationalCancer Institute (1R01TW007894) Conselho Nacional deDesenvolvimento Cientıfico e Tecnologico (Brazil Grant3075562012-3) Fundacao de Amparo a Pesquisa de MinasGerais (Brazil CBB PPM 0026512) and the Brazilian Ministryof Education (CAPES Agency grant PNPD 0256709-1)

ReferencesAyodo G Price AL Keinan A Ajwang A Otieno MF Orago AS

Patterson N Reich D 2007 Combining evidence of natural selectionwith association analysis increases power to detect malaria-resis-tance variants Am J Hum Genet 81(2)234ndash242

Bamshad M Wooding SP 2003 Signatures of natural selection in thehuman genome Nat Rev Genet 4(2)99ndash111

Bandelt H-J Forster P Rohl A 1999 Median-joining networks for infer-ring intraspecific phylogenies Mol Biol Evol 16(1)37ndash48

Barreiro LB Ben-Ali M Quach H et al (17 co-authors) 2009Evolutionary dynamics of human Toll-like receptors and their dif-ferent contributions to host defense PLoS Genet 5(7)e1000562

Barreiro LB Quintana-Murci L 2010 From evolutionary genetics tohuman immunology how selection shapes host defence genesNat Rev Genet 11(1)17ndash30

Barrett RD Schluter D 2008 Adaptation from standing genetic varia-tion Trends Ecol Evol 23(1)38ndash44

Bedard K Attar H Bonnefont J Jaquet V Borel C Plastre O Stasia MJAntonarakis SE Krause KH 2009 Three common polymorphisms inthe CYBA gene form a haplotype associated with decreased ROSgeneration Hum Mutat 30(7)1123ndash1133

Blekhman R Man O Herrmann L Boyko AR Indap A Kosiol CBustamante CD Teshima KM Przeworski M 2008 Natural selectionon genes that underlie human disease susceptibility Curr Biol18(12)883ndash889

Brandes RP Kreuzer J 2005 Vascular NADPH oxidases molecular mech-anisms of activation Cardiovasc Res 65(1)16ndash27

Buckley RH 2004 Pulmonary complications of primary immunodefi-ciencies Paediatr Respir Rev 5(Suppl A)S225ndashS233

Bustamante CD Fledel-Alon A Williamson S et al (13 co-authors)2005 Natural selection on protein-coding genes in the humangenome Nature 437(7062)1153ndash1157

Bustamante J Arias AA Vogt G et al (29 co-authors) 2011 GermlineCYBB mutations that selectively affect macrophages in kindredswith X-linked predisposition to tuberculous mycobacterial diseaseNat Immunol 12(3)213ndash221

Campbell MC Tishkoff SA 2008 African genetic diversity implicationsfor human demographic history modern human origins and com-plex disease mapping Annu Rev Genomics Hum Genet 9403ndash433

Chanock SJ Roesler J Zhan S Hopkins P Lee P Barrett DT ChristensenBL Curnutte JT Gorlach A 2000 Genomic structure of the humanp47-phox (NCF1) gene Blood Cells Mol Dis 26(1)37ndash46

Cunninghame Graham DS Morris DL Bhangale TR Criswell LASyvanen AC Ronnblom L Behrens TW Graham RR Vyse TJ2011 Association of NCF2 IKZF1 IRF8 IFIH1 and TYK2 with sys-temic lupus erythematosus PLoS Genet 7(10)e1002341

Excoffier L Laval G Schneider S 2007 Arlequin ver 30 an integratedsoftware package for population genetics data analysis EvolBioinform Online 147ndash50

2166

Tarazona-Santos et al doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

Excoffier L Ray N 2008 Surfing during population expansions promotesgenetic revolutions and structuration Trends Ecol Evol23(7)347ndash351

Ferrer-Admetlla A Bosch E Sikora M Marques-Bonet T Ramırez-SorianoA Muntasell A Navarro A Lazarus R Calafell F Bertranpetit J Casals F2008 Balancing selection is the main force shaping the evolution ofinnate immunity genes J Immunol 181(2)1315ndash1322

Fu YX Li WH 1993 Statistical tests of neutrality of mutations Genetics133(3)693ndash709

The 1000 Genomes Project Consortium 2010 A map of humangenome variation from population-scale sequencing Nature467(7319)1061ndash1073

The 1000 Genomes Project Consortium Abecasis GR Auton A BrooksLD DePristo MA Durbin RM Handsaker RE Kang HM Marth GTMcVean GA 2012 An integrated map of genetic variation from1092 human genomes Nature 491(7422)56ndash65

Guo SW Thompson EA 1992 Performing the exact test of Hardy-Weinberg proportion for multiple alleles Biometrics 48(2)361ndash372

Heyworth PG Cross AR Curnutte JT 2003 Chronic granulomatousdisease Curr Opin Immunol 15(5)578ndash584

Hudson RR 2002 Generating samples under a Wright-Fisher neutralmodel of genetic variation Bioinformatics 18(2)337ndash338

Jacob CO Eisenstein M Dinauer MC et al (21 co-authors) 2012 Lupus-associated causal mutation in neutrophil cytosolic factor 2 (NCF2)brings unique insights to the structure and function of NADPHoxidase Proc Natl Acad Sci U S A 109(2)E59ndashE67

Kimura M 1974 The neutral theory of molecular evolution CambridgeCambridge University Press

Kosiol C Vinar T da Fonseca RR Hubisz MJ Bustamante CD Nielsen RSiepel A 2008 Patterns of positive selection in six mammalian ge-nomes PLoS Genet 4(8)e1000144

Lacy F Kailasam MT OrsquoConnor DT Schmid-Schonbein GW Parmer RJ2000 Plasma hydrogen peroxide production in human essentialhypertension role of heredity gender and ethnicity Hypertension36(5)878ndash884

Laval G Patin E Barreiro LB Quintana-Murci L 2010 Formulating a his-torical and demographic model of recent human evolution based onresequencing data from noncoding regions PLoS One 5(4)e10284

Lewontin R 1972 The apportionment of human diversity Evol Biol 6391ndash398

Lindblad-Toh K Garber M Zuk O et al (88 co-authors) 2011 A high-resolution map of human evolutionary constraint using 29 mam-mals Nature 478(7370)476ndash482

Lodge R Diallo TO Descoteaux A 2006 Leishmania donovani lipopho-sphoglycan blocks NADPH oxidase assembly at the phagosomemembrane Cell Microbiol 8(12)1922ndash1931

Magalhaes WC Rodrigues MR Silva D Soares-Souza G Iannini MLCerqueira GC Faria-Campos AC Tarazona-Santos E 2012DIVERGENOME a bioinformatics platform to assist population ge-netics and genetic epidemiology studies Genet Epidemiol36(4)360ndash367

Marth JD Grewal PK 2008 Mammalian glycosylation in immunity NatRev Immunol 8(11)874ndash887

McDonald JH 1998 Improved tests for heterogeneity across a region ofDNA sequence in the ratio of polymorphism to divergence Mol BiolEvol 15377ndash384

McDonald JH Kreitman M 1991 Adaptive protein evolution at the Adhlocus in Drosophila Nature 351(6328)652ndash654

Nielsen R Bustamante C Clark AG et al (12 co-authors) 2005 A scanfor positively selected genes in the genomes of humans and chim-panzees PLoS Biol 3(6)e170

Olsson LM Lindqvist AK Kallberg H Padyukov L Burkhardt HAlfredsson L Klareskog L Holmdahl R 2007 A case-control studyof rheumatoid arthritis identifies an associated single nucleotidepolymorphism in the NCF4 gene supporting a role for the

NADPH-oxidase complex in autoimmunity Arthritis Res Ther9(5)R98

Packer BR Yeager M Burdett L et al (13 co-authors) 2006SNP500Cancer a public resource for sequence validation assay de-velopment and frequency analysis for genetic variation in candidategenes Nucleic Acids Res 34(Database issue)D617ndashD621

Peter BM Huerta-Sanchez E Nielsen R 2012 Distinguishing betweenselective sweeps from standing variation and from a de novo mu-tation PLoS Genet 8(10)e1003011

Przeworski M Coop G Wall JD 2005 The signature of positive selectionon standing genetic variation Evolution 59(11)2312ndash2323

Ptak SE Przeworski M 2002 Evidence for population growth in humansis confounded by fine-scale population structure Trends Genet18(11)559ndash563

Ramensky V Bork P Sunyaev S 2002 Human non-synonymous SNPsserver and survey Nucleic Acids Res 30(17)3894ndash3900

Rioux JD Xavier RJ Taylor KD et al (24 co-authors) 2007 Genome-wideassociation study identifies new susceptibility loci for Crohn diseaseand implicates autophagy in disease pathogenesis Nat Genet39(5)596ndash604

Roberts RL Hollis-Moffatt JE Gearry RB Kennedy MA Barclay MLMerriman TR 2008 Confirmation of association of IRGM andNCF4 with ileal Crohnrsquos disease in a population-based cohortGenes Immun 9(6)561ndash565

Rozas J 2009 DNA sequence polymorphism analysis using DnaSPMethods Mol Biol 537337ndash350

San Jose G Fortuno A Beloqui O Dıez J Zalba G 2008 NADPH oxidaseCYBA polymorphisms oxidative stress and cardiovascular diseasesClin Sci (Lond) 114(3)173ndash182

Santiago HC Gonzalez Lombana CZ Macedo JP et al (12 co-authors)2012 NADPH phagocyte oxidase knockout mice controlTrypanosoma cruzi proliferation but develop circulatory collapseand succumb to infection PLoS Negl Trop Dis 6(2)e1492

Stephens M Scheet P 2005 Accounting for decay of linkage disequilib-rium in haplotype inference and missing-data imputation Am JHum Genet 76(3)449ndash462

Sumimoto H Miyano K Takeya R 2005 Molecular composition andregulation of the Nox family NAD(P)H oxidases Biochem BiophysRes Commun 338(1)677ndash686

Tajima F 1983 Evolutionary relationship of DNA sequences in finitepopulations Genetics 105(2)437ndash460

Tajima F 1989 Statistical method for testing the neutral mutationhypothesis by DNA polymorphism Genetics 123(3)585ndash595

Tarazona-Santos E Bernig T Burdett L Magalhaes WC Fabbri C Liao JRedondo RA Welch R Yeager M Chanock SJ 2008 CYBB anNADPH-oxidase gene restricted diversity in humans and evidencefor differential long-term purifying selection on transmembrane andcytosolic domains Hum Mutat 29(5)623ndash632

Taylor RM Burritt JB Baniulis D Foubert TR Lord CI Dinauer MCParkos CA Jesaitis AJ 2004 Site-specific inhibitors of NADPH oxi-dase activity and structural probes of flavocytochrome b character-ization of six monoclonal antibodies to the p22phox subunitJ Immunol 173(12)7349ndash7357

Voight BF Kudaravalli S Wen X Pritchard JK 2006 A map of recentpositive selection in the human genome PLoS Biol 4(3)e72

Watterson GA 1975 On the number of segregating sites in geneticalmodels without recombination Theor Popul Biol 7(2) 256ndash276

Williamson SH Hernandez R Fledel-Alon A Zhu L Nielsen RBustamante CD 2005 Simultaneous inference of selection and pop-ulation growth from patterns of variation in the human genomeProc Natl Acad Sci U S A 102(22)7882ndash7887

Yang Z 2007a Adaptive molecular evolution In Balding DJ Bishop MCannings C editors Handbook of statistical genetics Vol 1 3rd edSusex (United Kingdom) John Wiley amp Sons p 377ndash406

Yang Z 2007b PAML 4 phylogenetic analysis by maximum likelihoodMol Biol Evol 241586ndash1591

2167

Evolutionary Dynamics of Human NADPH Oxidase Genes doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

natural selection which produces an excess of common allelesassociated with high genetic diversity Thus the observed pat-tern of CYBA diversity in Europeans is not consistent with aselective sweep on a standing variation but it is consistent witha scenario of balancing selection acting on standing variation

If we consider for CYBA that heterozygote advantage maybe the mechanisms of balancing selection we can speculatethat the biological basis for this mechanism may be the fol-lowing considering that p22-phox is not exclusive of thephagocyte NADPH oxidase but it is also part of Nox com-plexes expressed in other tissues the dependence of ROSproduction on CYBA variants has to be finely regulated IfCYBA variants induce high levels of ROS these variants mayfavor a phagocyte-dependent efficient response to pathogensbut may damage other endothelial tissues Alternativelytissue oxidative damage does not occur if ROS productionis low but this response may be associated with a weaker

phagocyte respiratory burst against pathogens In this con-text heterozygote individuals with a CYBA-dependent inter-mediate level of ROS production may have been favored bynatural selection

Our results contribute to the discussion regarding the rel-evance of balancing selection in shaping the diversity ofinnate immunity genes (Ferrer-Admetlla et al 2008 Barreiroet al 2009) Ferrer-Admetlla et al (2008) have associated therecurrent signatures of balancing selection on inflammatorygenes with the need for fine regulation CYBA is the onlyphagocytic NADPH oxidase gene that also encodesnonphagocyte Nox components thus CYBA has a role inROS cell signaling a potentially dangerous process due toits capability to produce oxidative damage to tissues Geneswith these characteristics likely need even tighter regulationThe interplay between pathogen-driven selective pressureon innate immunity genes and their concomitantnonimmunological functions is complex In addition toCYBA other interesting examples of this interplay can befound among the 10 human toll-like receptors (TLRs) thatshow a variety of signatures of natural selection TLR8 whichshows the strongest signature of purifying selection is alsoinvolved in neuronal development (Barreiro et al 2009) and itis difficult to discriminate the role of each function in deter-mining the observed signature of natural selection

With few exceptions the pathogens responsible for naturalselection on immune genes are difficult to specify In the caseof NADPH oxidase we can infer based on the spectrum ofinfections in CGD patients that catalase-positive bacteria andfungi such as Staphylococci Salmonella Candida Aspergillusand M tuberculosis may be the selective pathogensInteractions between the host and pathogens also includemechanisms of the latter to impair the respiratory burst ofthe former For example Leishmania donovani blocks the as-sembly of NADPH oxidase at the phagosome membrane(Lodge et al 2006) These mechanisms may constitute selec-tive pressures imposed by pathogens

The associations reported in GWAS between rs4821544 inNCF4 and Crohnrsquos disease (an idiopathic inflammatory boweldisease that predominantly involves the ileum and colonRioux et al 2007) and between rs10911363 in NCF2 and

Table 3 Pairwise FST Genetic Distances between Populations

CYBBa CYBA

Africa Europe Asia Hispanic Africa Europe Asia Hispanic

Africa mdash 0316 0264 0092 mdash 0074 0083 0065

Europe 0257 mdash 0000 0107 mdash 0002 0056

Asia 0211 0000 mdash 0073 mdash 0054

Hispanic 0070 0082 0056 mdash mdash

NCF2 NCF4

Africa mdash 0048 0058 0037 mdash 0128 0136 0158

Europe mdash 0069 0000 mdash 0026 0026

Asia mdash 0059 mdash 0005

Hispanic mdash mdash

aFor CYBB FST estimators are above the diagonal Below the diagonal are the FST values corrected as if the effective population sizes of X chromosome genes were equal toautosomal ones

Table 4 Results of Neutrality Tests for the NADPH Oxidase Genesand Their Significancea

African European Asian Hispanic

Tajimarsquos D

CYBB 0473 0274 0813 0084

CYBA 0580 1242 0684 0707

NCF2 0412 0939 0883 0987

NCF4 0750 0243 0556 0102

Fu and Lirsquos D

CYBB 1050 1110 0395 0977

CYBA 1485 1734 1188 0138

NCF2 0407 1105 1904 1398

NCF4 0308 0443 1893 0266

Fu and Lirsquos F

CYBB 0980 0811 0114 0814

CYBA 1382 1893 1205 0405

NCF2 0512 1252 1893 1567

NCF4 0557 0231 1225 0248

NOTEmdashUnderlined values represent significant results under the demographic modelinferred by Laval et al (2010) for human populations See details in supplementarytable S4 Supplementary Material onlineaThe McDonaldndashKreitman test is nonsignificant in any of the cases

Significant under the WrightndashFisher model of constant population size

2164

Tarazona-Santos et al doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

systemic lupus erythematosus (Cunninghame Graham et al2011) confirm the involvement of NADPH genes in the path-ogenesis of inflammatory-related common diseases Ourclaim that natural selection acted on CYBA (and maybe inNCF2) is relevant for biomedical studies because combiningevidence of natural selection with association analyses inimmune genes increases the statistical power to detect dis-ease-associated variants (Ayodo et al 2007) As a new gener-ation of association studies focusing on rare variation isemerging the combination of genes deemed interestingfrom GWAS and populations with an excess of rare variantsin these genes such as NCF2 in Asians are particularly inter-esting as a source of rare variants with clinical relevanceFinally by determining through molecular evolution mappingthat the extracellular portion of gp91 (encoded by CYBB inthe X-chromosome) has been subject to recurrent episodes ofpositive selection at the scale of mammals evolution we positthe hypothesis that this portion of the NADPH oxidase isrelevant for currently unknown biological processes thatonce revealed by structural and functional investigationswill contribute to understanding the role of NADPH oxidasein infectious autoimmune and cardiovascular diseases

Materials and Methods

Molecular Evolution Analysis of NADPH Genes

We used the maximum likelihood framework developed byYang (2007a) to estimate for the NADPH oxidase genes

under a variety of evolutionary models as implemented in thePAML software (Yang 2007b) This approach allows infer-ences about the evolution of a coding region along aninter-specific phylogeny mapping the codons that haveevolved under strongweak purifying selection neutrality oradaptive positive selection Further details about these anal-yses are available as supplementary material SupplementaryMaterial online

Human Population Genetics of NADPH Genes

For human population genetics analyses we conducted bidi-rectional Sanger sequencing of CYBB CYBA NCF2 and NCF4for a total of 35242 bp for each of 102 healthy individuals aspart of the SNP500 Cancer project (Packer et al 2006 seesupplementary fig S1 Supplementary Material online for de-tails) Human population resequencing data for CYBB werepublished in Tarazona-Santos et al (2008) The resequencingexperiments were performed as in Packer et al (2006) These102 unrelated individuals include the following 24 of Africanancestry (15 African Americans from the United States and 9Pygmies) 23 admixed Latin Americans (from Mexico PuertoRico and South America) 31 Europeans (from the CEPHUTAH pedigree and the NIEHS Environmental GenomeProject) and 24 AsiansndashOceanians (from Melanesia PakistanChina Cambodia Japan and Taiwan)

After controlling for multiple tests we confirmed that allSNPs fit the HardyndashWeinberg equilibrium by the Guo and

CYBA - A16

CYBA - A3

Y72H - rs4673

V174A - rs1049254 CYBA - E18

(a)

NCF2 - E10 H389Q - rs17849502

NCF2 - B3

NCF2 - D11

NCF2 - C1K181R - rs2274064(b)

FIG 4 Phylogenetic networks of (a) CYBA in Europeans and (b) NCF2 in Europeans (black) and Asians (yellow) The lengths of the branches areproportional to the number of mutations Only nonsynonymous mutations are shown The haplotype names correspond to the table of inferredhaplotypes in the supplementary material Supplementary Material online We only show the names of haplotypes with a frequency gt5 Ancestralhaplotypes (inferred as being the human haplotype or median vector most similar to the chimpanzee sequence) are indicated by an arrow Medianvectors are in red

2165

Evolutionary Dynamics of Human NADPH Oxidase Genes doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

Thompson (1992) test which was implemented in the soft-ware Arlequin 30 (Excoffier et al 2007) Insertion-deletions(INDELs) were excluded from population genetics analysesTo assess intrapopulation diversity we used two statistics the per-site mean number of pairwise differences betweensequences (Tajima 1983) and w based on the number ofsegregating sites (S) (Watterson 1975) We measured pairwisebetween-populations diversity by using the FST statistics cal-culated using the software DnaSP (Rozas 2009)

Haplotypes and the recombination parameter were in-ferred using the PHASE software (Stephens and Scheet 2005)and diversity indexes calculations (tables 2 and 3) as well asneutrality statistics (table 4) were estimated using DnaSP soft-ware We applied two kinds of neutrality tests 1) tests basedon the allelic spectrum which is the distribution of polymor-phisms across different classes of frequencies namelyTajimarsquos DT (Tajima 1989) and FundashLirsquos DFL and FFL (Fu andLi 1993) and 2) tests based on comparisons between thenumber of polymorphisms in human populations and fixeddifferences with the chimpanzee (ie outgroup) namely theMcDonald and Kreitman (1991) test and the adaptedKolmogorovndashSmirnoff test by McDonald (1998) For thefirst set of tests we used as null hypotheses both the classicWrightndashFisher model of neutrality with a constant popula-tion size as well as the more realistic evolutionary scenario forhuman populations inferred by Laval et al (2010) In the caseof the scenario of Laval et al (2010) we ignored interconti-nental gene flow within the Old World because these raregene flow events likely does not affect the level of significanceof the neutrality tests given its very low inferred values(13 105) Null distributions used to test the significanceof the neutrality tests under these evolutionary scenarios weregenerated using coalescent simulations and a significancelevel of 005 (Hudson 2002) The KolmogorovndashSmirnoff testof neutrality adapted by McDonald (1998) was performedusing Slider software available at httpudeledu~mcdonaldaboutdnasliderhtml (last accessed July 16 2013) Weperformed coalescent simulations using ms software(Hudson 2002) Further methodological details are availableas supplementary material Supplementary Material onlineWe constructed the CYBA and NCF2 networks using allSNP variants and applying the Median joining algorithmand the maximum parsimony option calculations as imple-mented in the software Network 46 (Bandelt et al 1999)

Supplementary MaterialSupplementary tables S1ndashS5 and figure S1 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

SJC conceived the study ET-S MM FL RC AC CF LPand BS generated the dataanalyzed the sequences LB ACWCSM and MY provided tools for data generationand analyses ET-S MM FL WCSM and RR performedpopulation genetics analyses ET-S and MM prepared tablesand figures ET-S and SJC wrote the manuscript All the

authors contributed with discussions about the results Theauthors are grateful to Silvia Fuselli for discussions of ourresults to Fernando Levi Soares for technical assistance withthe figures and the Sequencing Group of the CoreGenotyping Facility (National Cancer Institute) for their tech-nical assistance This work was supported by the IntramuralResearch Program of the National Cancer Institute (NCI) andNational Institutes of Health (NIH) CF was supported by theUniversity of Bologna ET-S MM WCSM and FL weresupported by the Fogarty International Center and NationalCancer Institute (1R01TW007894) Conselho Nacional deDesenvolvimento Cientıfico e Tecnologico (Brazil Grant3075562012-3) Fundacao de Amparo a Pesquisa de MinasGerais (Brazil CBB PPM 0026512) and the Brazilian Ministryof Education (CAPES Agency grant PNPD 0256709-1)

ReferencesAyodo G Price AL Keinan A Ajwang A Otieno MF Orago AS

Patterson N Reich D 2007 Combining evidence of natural selectionwith association analysis increases power to detect malaria-resis-tance variants Am J Hum Genet 81(2)234ndash242

Bamshad M Wooding SP 2003 Signatures of natural selection in thehuman genome Nat Rev Genet 4(2)99ndash111

Bandelt H-J Forster P Rohl A 1999 Median-joining networks for infer-ring intraspecific phylogenies Mol Biol Evol 16(1)37ndash48

Barreiro LB Ben-Ali M Quach H et al (17 co-authors) 2009Evolutionary dynamics of human Toll-like receptors and their dif-ferent contributions to host defense PLoS Genet 5(7)e1000562

Barreiro LB Quintana-Murci L 2010 From evolutionary genetics tohuman immunology how selection shapes host defence genesNat Rev Genet 11(1)17ndash30

Barrett RD Schluter D 2008 Adaptation from standing genetic varia-tion Trends Ecol Evol 23(1)38ndash44

Bedard K Attar H Bonnefont J Jaquet V Borel C Plastre O Stasia MJAntonarakis SE Krause KH 2009 Three common polymorphisms inthe CYBA gene form a haplotype associated with decreased ROSgeneration Hum Mutat 30(7)1123ndash1133

Blekhman R Man O Herrmann L Boyko AR Indap A Kosiol CBustamante CD Teshima KM Przeworski M 2008 Natural selectionon genes that underlie human disease susceptibility Curr Biol18(12)883ndash889

Brandes RP Kreuzer J 2005 Vascular NADPH oxidases molecular mech-anisms of activation Cardiovasc Res 65(1)16ndash27

Buckley RH 2004 Pulmonary complications of primary immunodefi-ciencies Paediatr Respir Rev 5(Suppl A)S225ndashS233

Bustamante CD Fledel-Alon A Williamson S et al (13 co-authors)2005 Natural selection on protein-coding genes in the humangenome Nature 437(7062)1153ndash1157

Bustamante J Arias AA Vogt G et al (29 co-authors) 2011 GermlineCYBB mutations that selectively affect macrophages in kindredswith X-linked predisposition to tuberculous mycobacterial diseaseNat Immunol 12(3)213ndash221

Campbell MC Tishkoff SA 2008 African genetic diversity implicationsfor human demographic history modern human origins and com-plex disease mapping Annu Rev Genomics Hum Genet 9403ndash433

Chanock SJ Roesler J Zhan S Hopkins P Lee P Barrett DT ChristensenBL Curnutte JT Gorlach A 2000 Genomic structure of the humanp47-phox (NCF1) gene Blood Cells Mol Dis 26(1)37ndash46

Cunninghame Graham DS Morris DL Bhangale TR Criswell LASyvanen AC Ronnblom L Behrens TW Graham RR Vyse TJ2011 Association of NCF2 IKZF1 IRF8 IFIH1 and TYK2 with sys-temic lupus erythematosus PLoS Genet 7(10)e1002341

Excoffier L Laval G Schneider S 2007 Arlequin ver 30 an integratedsoftware package for population genetics data analysis EvolBioinform Online 147ndash50

2166

Tarazona-Santos et al doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

Excoffier L Ray N 2008 Surfing during population expansions promotesgenetic revolutions and structuration Trends Ecol Evol23(7)347ndash351

Ferrer-Admetlla A Bosch E Sikora M Marques-Bonet T Ramırez-SorianoA Muntasell A Navarro A Lazarus R Calafell F Bertranpetit J Casals F2008 Balancing selection is the main force shaping the evolution ofinnate immunity genes J Immunol 181(2)1315ndash1322

Fu YX Li WH 1993 Statistical tests of neutrality of mutations Genetics133(3)693ndash709

The 1000 Genomes Project Consortium 2010 A map of humangenome variation from population-scale sequencing Nature467(7319)1061ndash1073

The 1000 Genomes Project Consortium Abecasis GR Auton A BrooksLD DePristo MA Durbin RM Handsaker RE Kang HM Marth GTMcVean GA 2012 An integrated map of genetic variation from1092 human genomes Nature 491(7422)56ndash65

Guo SW Thompson EA 1992 Performing the exact test of Hardy-Weinberg proportion for multiple alleles Biometrics 48(2)361ndash372

Heyworth PG Cross AR Curnutte JT 2003 Chronic granulomatousdisease Curr Opin Immunol 15(5)578ndash584

Hudson RR 2002 Generating samples under a Wright-Fisher neutralmodel of genetic variation Bioinformatics 18(2)337ndash338

Jacob CO Eisenstein M Dinauer MC et al (21 co-authors) 2012 Lupus-associated causal mutation in neutrophil cytosolic factor 2 (NCF2)brings unique insights to the structure and function of NADPHoxidase Proc Natl Acad Sci U S A 109(2)E59ndashE67

Kimura M 1974 The neutral theory of molecular evolution CambridgeCambridge University Press

Kosiol C Vinar T da Fonseca RR Hubisz MJ Bustamante CD Nielsen RSiepel A 2008 Patterns of positive selection in six mammalian ge-nomes PLoS Genet 4(8)e1000144

Lacy F Kailasam MT OrsquoConnor DT Schmid-Schonbein GW Parmer RJ2000 Plasma hydrogen peroxide production in human essentialhypertension role of heredity gender and ethnicity Hypertension36(5)878ndash884

Laval G Patin E Barreiro LB Quintana-Murci L 2010 Formulating a his-torical and demographic model of recent human evolution based onresequencing data from noncoding regions PLoS One 5(4)e10284

Lewontin R 1972 The apportionment of human diversity Evol Biol 6391ndash398

Lindblad-Toh K Garber M Zuk O et al (88 co-authors) 2011 A high-resolution map of human evolutionary constraint using 29 mam-mals Nature 478(7370)476ndash482

Lodge R Diallo TO Descoteaux A 2006 Leishmania donovani lipopho-sphoglycan blocks NADPH oxidase assembly at the phagosomemembrane Cell Microbiol 8(12)1922ndash1931

Magalhaes WC Rodrigues MR Silva D Soares-Souza G Iannini MLCerqueira GC Faria-Campos AC Tarazona-Santos E 2012DIVERGENOME a bioinformatics platform to assist population ge-netics and genetic epidemiology studies Genet Epidemiol36(4)360ndash367

Marth JD Grewal PK 2008 Mammalian glycosylation in immunity NatRev Immunol 8(11)874ndash887

McDonald JH 1998 Improved tests for heterogeneity across a region ofDNA sequence in the ratio of polymorphism to divergence Mol BiolEvol 15377ndash384

McDonald JH Kreitman M 1991 Adaptive protein evolution at the Adhlocus in Drosophila Nature 351(6328)652ndash654

Nielsen R Bustamante C Clark AG et al (12 co-authors) 2005 A scanfor positively selected genes in the genomes of humans and chim-panzees PLoS Biol 3(6)e170

Olsson LM Lindqvist AK Kallberg H Padyukov L Burkhardt HAlfredsson L Klareskog L Holmdahl R 2007 A case-control studyof rheumatoid arthritis identifies an associated single nucleotidepolymorphism in the NCF4 gene supporting a role for the

NADPH-oxidase complex in autoimmunity Arthritis Res Ther9(5)R98

Packer BR Yeager M Burdett L et al (13 co-authors) 2006SNP500Cancer a public resource for sequence validation assay de-velopment and frequency analysis for genetic variation in candidategenes Nucleic Acids Res 34(Database issue)D617ndashD621

Peter BM Huerta-Sanchez E Nielsen R 2012 Distinguishing betweenselective sweeps from standing variation and from a de novo mu-tation PLoS Genet 8(10)e1003011

Przeworski M Coop G Wall JD 2005 The signature of positive selectionon standing genetic variation Evolution 59(11)2312ndash2323

Ptak SE Przeworski M 2002 Evidence for population growth in humansis confounded by fine-scale population structure Trends Genet18(11)559ndash563

Ramensky V Bork P Sunyaev S 2002 Human non-synonymous SNPsserver and survey Nucleic Acids Res 30(17)3894ndash3900

Rioux JD Xavier RJ Taylor KD et al (24 co-authors) 2007 Genome-wideassociation study identifies new susceptibility loci for Crohn diseaseand implicates autophagy in disease pathogenesis Nat Genet39(5)596ndash604

Roberts RL Hollis-Moffatt JE Gearry RB Kennedy MA Barclay MLMerriman TR 2008 Confirmation of association of IRGM andNCF4 with ileal Crohnrsquos disease in a population-based cohortGenes Immun 9(6)561ndash565

Rozas J 2009 DNA sequence polymorphism analysis using DnaSPMethods Mol Biol 537337ndash350

San Jose G Fortuno A Beloqui O Dıez J Zalba G 2008 NADPH oxidaseCYBA polymorphisms oxidative stress and cardiovascular diseasesClin Sci (Lond) 114(3)173ndash182

Santiago HC Gonzalez Lombana CZ Macedo JP et al (12 co-authors)2012 NADPH phagocyte oxidase knockout mice controlTrypanosoma cruzi proliferation but develop circulatory collapseand succumb to infection PLoS Negl Trop Dis 6(2)e1492

Stephens M Scheet P 2005 Accounting for decay of linkage disequilib-rium in haplotype inference and missing-data imputation Am JHum Genet 76(3)449ndash462

Sumimoto H Miyano K Takeya R 2005 Molecular composition andregulation of the Nox family NAD(P)H oxidases Biochem BiophysRes Commun 338(1)677ndash686

Tajima F 1983 Evolutionary relationship of DNA sequences in finitepopulations Genetics 105(2)437ndash460

Tajima F 1989 Statistical method for testing the neutral mutationhypothesis by DNA polymorphism Genetics 123(3)585ndash595

Tarazona-Santos E Bernig T Burdett L Magalhaes WC Fabbri C Liao JRedondo RA Welch R Yeager M Chanock SJ 2008 CYBB anNADPH-oxidase gene restricted diversity in humans and evidencefor differential long-term purifying selection on transmembrane andcytosolic domains Hum Mutat 29(5)623ndash632

Taylor RM Burritt JB Baniulis D Foubert TR Lord CI Dinauer MCParkos CA Jesaitis AJ 2004 Site-specific inhibitors of NADPH oxi-dase activity and structural probes of flavocytochrome b character-ization of six monoclonal antibodies to the p22phox subunitJ Immunol 173(12)7349ndash7357

Voight BF Kudaravalli S Wen X Pritchard JK 2006 A map of recentpositive selection in the human genome PLoS Biol 4(3)e72

Watterson GA 1975 On the number of segregating sites in geneticalmodels without recombination Theor Popul Biol 7(2) 256ndash276

Williamson SH Hernandez R Fledel-Alon A Zhu L Nielsen RBustamante CD 2005 Simultaneous inference of selection and pop-ulation growth from patterns of variation in the human genomeProc Natl Acad Sci U S A 102(22)7882ndash7887

Yang Z 2007a Adaptive molecular evolution In Balding DJ Bishop MCannings C editors Handbook of statistical genetics Vol 1 3rd edSusex (United Kingdom) John Wiley amp Sons p 377ndash406

Yang Z 2007b PAML 4 phylogenetic analysis by maximum likelihoodMol Biol Evol 241586ndash1591

2167

Evolutionary Dynamics of Human NADPH Oxidase Genes doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

systemic lupus erythematosus (Cunninghame Graham et al2011) confirm the involvement of NADPH genes in the path-ogenesis of inflammatory-related common diseases Ourclaim that natural selection acted on CYBA (and maybe inNCF2) is relevant for biomedical studies because combiningevidence of natural selection with association analyses inimmune genes increases the statistical power to detect dis-ease-associated variants (Ayodo et al 2007) As a new gener-ation of association studies focusing on rare variation isemerging the combination of genes deemed interestingfrom GWAS and populations with an excess of rare variantsin these genes such as NCF2 in Asians are particularly inter-esting as a source of rare variants with clinical relevanceFinally by determining through molecular evolution mappingthat the extracellular portion of gp91 (encoded by CYBB inthe X-chromosome) has been subject to recurrent episodes ofpositive selection at the scale of mammals evolution we positthe hypothesis that this portion of the NADPH oxidase isrelevant for currently unknown biological processes thatonce revealed by structural and functional investigationswill contribute to understanding the role of NADPH oxidasein infectious autoimmune and cardiovascular diseases

Materials and Methods

Molecular Evolution Analysis of NADPH Genes

We used the maximum likelihood framework developed byYang (2007a) to estimate for the NADPH oxidase genes

under a variety of evolutionary models as implemented in thePAML software (Yang 2007b) This approach allows infer-ences about the evolution of a coding region along aninter-specific phylogeny mapping the codons that haveevolved under strongweak purifying selection neutrality oradaptive positive selection Further details about these anal-yses are available as supplementary material SupplementaryMaterial online

Human Population Genetics of NADPH Genes

For human population genetics analyses we conducted bidi-rectional Sanger sequencing of CYBB CYBA NCF2 and NCF4for a total of 35242 bp for each of 102 healthy individuals aspart of the SNP500 Cancer project (Packer et al 2006 seesupplementary fig S1 Supplementary Material online for de-tails) Human population resequencing data for CYBB werepublished in Tarazona-Santos et al (2008) The resequencingexperiments were performed as in Packer et al (2006) These102 unrelated individuals include the following 24 of Africanancestry (15 African Americans from the United States and 9Pygmies) 23 admixed Latin Americans (from Mexico PuertoRico and South America) 31 Europeans (from the CEPHUTAH pedigree and the NIEHS Environmental GenomeProject) and 24 AsiansndashOceanians (from Melanesia PakistanChina Cambodia Japan and Taiwan)

After controlling for multiple tests we confirmed that allSNPs fit the HardyndashWeinberg equilibrium by the Guo and

CYBA - A16

CYBA - A3

Y72H - rs4673

V174A - rs1049254 CYBA - E18

(a)

NCF2 - E10 H389Q - rs17849502

NCF2 - B3

NCF2 - D11

NCF2 - C1K181R - rs2274064(b)

FIG 4 Phylogenetic networks of (a) CYBA in Europeans and (b) NCF2 in Europeans (black) and Asians (yellow) The lengths of the branches areproportional to the number of mutations Only nonsynonymous mutations are shown The haplotype names correspond to the table of inferredhaplotypes in the supplementary material Supplementary Material online We only show the names of haplotypes with a frequency gt5 Ancestralhaplotypes (inferred as being the human haplotype or median vector most similar to the chimpanzee sequence) are indicated by an arrow Medianvectors are in red

2165

Evolutionary Dynamics of Human NADPH Oxidase Genes doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

Thompson (1992) test which was implemented in the soft-ware Arlequin 30 (Excoffier et al 2007) Insertion-deletions(INDELs) were excluded from population genetics analysesTo assess intrapopulation diversity we used two statistics the per-site mean number of pairwise differences betweensequences (Tajima 1983) and w based on the number ofsegregating sites (S) (Watterson 1975) We measured pairwisebetween-populations diversity by using the FST statistics cal-culated using the software DnaSP (Rozas 2009)

Haplotypes and the recombination parameter were in-ferred using the PHASE software (Stephens and Scheet 2005)and diversity indexes calculations (tables 2 and 3) as well asneutrality statistics (table 4) were estimated using DnaSP soft-ware We applied two kinds of neutrality tests 1) tests basedon the allelic spectrum which is the distribution of polymor-phisms across different classes of frequencies namelyTajimarsquos DT (Tajima 1989) and FundashLirsquos DFL and FFL (Fu andLi 1993) and 2) tests based on comparisons between thenumber of polymorphisms in human populations and fixeddifferences with the chimpanzee (ie outgroup) namely theMcDonald and Kreitman (1991) test and the adaptedKolmogorovndashSmirnoff test by McDonald (1998) For thefirst set of tests we used as null hypotheses both the classicWrightndashFisher model of neutrality with a constant popula-tion size as well as the more realistic evolutionary scenario forhuman populations inferred by Laval et al (2010) In the caseof the scenario of Laval et al (2010) we ignored interconti-nental gene flow within the Old World because these raregene flow events likely does not affect the level of significanceof the neutrality tests given its very low inferred values(13 105) Null distributions used to test the significanceof the neutrality tests under these evolutionary scenarios weregenerated using coalescent simulations and a significancelevel of 005 (Hudson 2002) The KolmogorovndashSmirnoff testof neutrality adapted by McDonald (1998) was performedusing Slider software available at httpudeledu~mcdonaldaboutdnasliderhtml (last accessed July 16 2013) Weperformed coalescent simulations using ms software(Hudson 2002) Further methodological details are availableas supplementary material Supplementary Material onlineWe constructed the CYBA and NCF2 networks using allSNP variants and applying the Median joining algorithmand the maximum parsimony option calculations as imple-mented in the software Network 46 (Bandelt et al 1999)

Supplementary MaterialSupplementary tables S1ndashS5 and figure S1 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

SJC conceived the study ET-S MM FL RC AC CF LPand BS generated the dataanalyzed the sequences LB ACWCSM and MY provided tools for data generationand analyses ET-S MM FL WCSM and RR performedpopulation genetics analyses ET-S and MM prepared tablesand figures ET-S and SJC wrote the manuscript All the

authors contributed with discussions about the results Theauthors are grateful to Silvia Fuselli for discussions of ourresults to Fernando Levi Soares for technical assistance withthe figures and the Sequencing Group of the CoreGenotyping Facility (National Cancer Institute) for their tech-nical assistance This work was supported by the IntramuralResearch Program of the National Cancer Institute (NCI) andNational Institutes of Health (NIH) CF was supported by theUniversity of Bologna ET-S MM WCSM and FL weresupported by the Fogarty International Center and NationalCancer Institute (1R01TW007894) Conselho Nacional deDesenvolvimento Cientıfico e Tecnologico (Brazil Grant3075562012-3) Fundacao de Amparo a Pesquisa de MinasGerais (Brazil CBB PPM 0026512) and the Brazilian Ministryof Education (CAPES Agency grant PNPD 0256709-1)

ReferencesAyodo G Price AL Keinan A Ajwang A Otieno MF Orago AS

Patterson N Reich D 2007 Combining evidence of natural selectionwith association analysis increases power to detect malaria-resis-tance variants Am J Hum Genet 81(2)234ndash242

Bamshad M Wooding SP 2003 Signatures of natural selection in thehuman genome Nat Rev Genet 4(2)99ndash111

Bandelt H-J Forster P Rohl A 1999 Median-joining networks for infer-ring intraspecific phylogenies Mol Biol Evol 16(1)37ndash48

Barreiro LB Ben-Ali M Quach H et al (17 co-authors) 2009Evolutionary dynamics of human Toll-like receptors and their dif-ferent contributions to host defense PLoS Genet 5(7)e1000562

Barreiro LB Quintana-Murci L 2010 From evolutionary genetics tohuman immunology how selection shapes host defence genesNat Rev Genet 11(1)17ndash30

Barrett RD Schluter D 2008 Adaptation from standing genetic varia-tion Trends Ecol Evol 23(1)38ndash44

Bedard K Attar H Bonnefont J Jaquet V Borel C Plastre O Stasia MJAntonarakis SE Krause KH 2009 Three common polymorphisms inthe CYBA gene form a haplotype associated with decreased ROSgeneration Hum Mutat 30(7)1123ndash1133

Blekhman R Man O Herrmann L Boyko AR Indap A Kosiol CBustamante CD Teshima KM Przeworski M 2008 Natural selectionon genes that underlie human disease susceptibility Curr Biol18(12)883ndash889

Brandes RP Kreuzer J 2005 Vascular NADPH oxidases molecular mech-anisms of activation Cardiovasc Res 65(1)16ndash27

Buckley RH 2004 Pulmonary complications of primary immunodefi-ciencies Paediatr Respir Rev 5(Suppl A)S225ndashS233

Bustamante CD Fledel-Alon A Williamson S et al (13 co-authors)2005 Natural selection on protein-coding genes in the humangenome Nature 437(7062)1153ndash1157

Bustamante J Arias AA Vogt G et al (29 co-authors) 2011 GermlineCYBB mutations that selectively affect macrophages in kindredswith X-linked predisposition to tuberculous mycobacterial diseaseNat Immunol 12(3)213ndash221

Campbell MC Tishkoff SA 2008 African genetic diversity implicationsfor human demographic history modern human origins and com-plex disease mapping Annu Rev Genomics Hum Genet 9403ndash433

Chanock SJ Roesler J Zhan S Hopkins P Lee P Barrett DT ChristensenBL Curnutte JT Gorlach A 2000 Genomic structure of the humanp47-phox (NCF1) gene Blood Cells Mol Dis 26(1)37ndash46

Cunninghame Graham DS Morris DL Bhangale TR Criswell LASyvanen AC Ronnblom L Behrens TW Graham RR Vyse TJ2011 Association of NCF2 IKZF1 IRF8 IFIH1 and TYK2 with sys-temic lupus erythematosus PLoS Genet 7(10)e1002341

Excoffier L Laval G Schneider S 2007 Arlequin ver 30 an integratedsoftware package for population genetics data analysis EvolBioinform Online 147ndash50

2166

Tarazona-Santos et al doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

Excoffier L Ray N 2008 Surfing during population expansions promotesgenetic revolutions and structuration Trends Ecol Evol23(7)347ndash351

Ferrer-Admetlla A Bosch E Sikora M Marques-Bonet T Ramırez-SorianoA Muntasell A Navarro A Lazarus R Calafell F Bertranpetit J Casals F2008 Balancing selection is the main force shaping the evolution ofinnate immunity genes J Immunol 181(2)1315ndash1322

Fu YX Li WH 1993 Statistical tests of neutrality of mutations Genetics133(3)693ndash709

The 1000 Genomes Project Consortium 2010 A map of humangenome variation from population-scale sequencing Nature467(7319)1061ndash1073

The 1000 Genomes Project Consortium Abecasis GR Auton A BrooksLD DePristo MA Durbin RM Handsaker RE Kang HM Marth GTMcVean GA 2012 An integrated map of genetic variation from1092 human genomes Nature 491(7422)56ndash65

Guo SW Thompson EA 1992 Performing the exact test of Hardy-Weinberg proportion for multiple alleles Biometrics 48(2)361ndash372

Heyworth PG Cross AR Curnutte JT 2003 Chronic granulomatousdisease Curr Opin Immunol 15(5)578ndash584

Hudson RR 2002 Generating samples under a Wright-Fisher neutralmodel of genetic variation Bioinformatics 18(2)337ndash338

Jacob CO Eisenstein M Dinauer MC et al (21 co-authors) 2012 Lupus-associated causal mutation in neutrophil cytosolic factor 2 (NCF2)brings unique insights to the structure and function of NADPHoxidase Proc Natl Acad Sci U S A 109(2)E59ndashE67

Kimura M 1974 The neutral theory of molecular evolution CambridgeCambridge University Press

Kosiol C Vinar T da Fonseca RR Hubisz MJ Bustamante CD Nielsen RSiepel A 2008 Patterns of positive selection in six mammalian ge-nomes PLoS Genet 4(8)e1000144

Lacy F Kailasam MT OrsquoConnor DT Schmid-Schonbein GW Parmer RJ2000 Plasma hydrogen peroxide production in human essentialhypertension role of heredity gender and ethnicity Hypertension36(5)878ndash884

Laval G Patin E Barreiro LB Quintana-Murci L 2010 Formulating a his-torical and demographic model of recent human evolution based onresequencing data from noncoding regions PLoS One 5(4)e10284

Lewontin R 1972 The apportionment of human diversity Evol Biol 6391ndash398

Lindblad-Toh K Garber M Zuk O et al (88 co-authors) 2011 A high-resolution map of human evolutionary constraint using 29 mam-mals Nature 478(7370)476ndash482

Lodge R Diallo TO Descoteaux A 2006 Leishmania donovani lipopho-sphoglycan blocks NADPH oxidase assembly at the phagosomemembrane Cell Microbiol 8(12)1922ndash1931

Magalhaes WC Rodrigues MR Silva D Soares-Souza G Iannini MLCerqueira GC Faria-Campos AC Tarazona-Santos E 2012DIVERGENOME a bioinformatics platform to assist population ge-netics and genetic epidemiology studies Genet Epidemiol36(4)360ndash367

Marth JD Grewal PK 2008 Mammalian glycosylation in immunity NatRev Immunol 8(11)874ndash887

McDonald JH 1998 Improved tests for heterogeneity across a region ofDNA sequence in the ratio of polymorphism to divergence Mol BiolEvol 15377ndash384

McDonald JH Kreitman M 1991 Adaptive protein evolution at the Adhlocus in Drosophila Nature 351(6328)652ndash654

Nielsen R Bustamante C Clark AG et al (12 co-authors) 2005 A scanfor positively selected genes in the genomes of humans and chim-panzees PLoS Biol 3(6)e170

Olsson LM Lindqvist AK Kallberg H Padyukov L Burkhardt HAlfredsson L Klareskog L Holmdahl R 2007 A case-control studyof rheumatoid arthritis identifies an associated single nucleotidepolymorphism in the NCF4 gene supporting a role for the

NADPH-oxidase complex in autoimmunity Arthritis Res Ther9(5)R98

Packer BR Yeager M Burdett L et al (13 co-authors) 2006SNP500Cancer a public resource for sequence validation assay de-velopment and frequency analysis for genetic variation in candidategenes Nucleic Acids Res 34(Database issue)D617ndashD621

Peter BM Huerta-Sanchez E Nielsen R 2012 Distinguishing betweenselective sweeps from standing variation and from a de novo mu-tation PLoS Genet 8(10)e1003011

Przeworski M Coop G Wall JD 2005 The signature of positive selectionon standing genetic variation Evolution 59(11)2312ndash2323

Ptak SE Przeworski M 2002 Evidence for population growth in humansis confounded by fine-scale population structure Trends Genet18(11)559ndash563

Ramensky V Bork P Sunyaev S 2002 Human non-synonymous SNPsserver and survey Nucleic Acids Res 30(17)3894ndash3900

Rioux JD Xavier RJ Taylor KD et al (24 co-authors) 2007 Genome-wideassociation study identifies new susceptibility loci for Crohn diseaseand implicates autophagy in disease pathogenesis Nat Genet39(5)596ndash604

Roberts RL Hollis-Moffatt JE Gearry RB Kennedy MA Barclay MLMerriman TR 2008 Confirmation of association of IRGM andNCF4 with ileal Crohnrsquos disease in a population-based cohortGenes Immun 9(6)561ndash565

Rozas J 2009 DNA sequence polymorphism analysis using DnaSPMethods Mol Biol 537337ndash350

San Jose G Fortuno A Beloqui O Dıez J Zalba G 2008 NADPH oxidaseCYBA polymorphisms oxidative stress and cardiovascular diseasesClin Sci (Lond) 114(3)173ndash182

Santiago HC Gonzalez Lombana CZ Macedo JP et al (12 co-authors)2012 NADPH phagocyte oxidase knockout mice controlTrypanosoma cruzi proliferation but develop circulatory collapseand succumb to infection PLoS Negl Trop Dis 6(2)e1492

Stephens M Scheet P 2005 Accounting for decay of linkage disequilib-rium in haplotype inference and missing-data imputation Am JHum Genet 76(3)449ndash462

Sumimoto H Miyano K Takeya R 2005 Molecular composition andregulation of the Nox family NAD(P)H oxidases Biochem BiophysRes Commun 338(1)677ndash686

Tajima F 1983 Evolutionary relationship of DNA sequences in finitepopulations Genetics 105(2)437ndash460

Tajima F 1989 Statistical method for testing the neutral mutationhypothesis by DNA polymorphism Genetics 123(3)585ndash595

Tarazona-Santos E Bernig T Burdett L Magalhaes WC Fabbri C Liao JRedondo RA Welch R Yeager M Chanock SJ 2008 CYBB anNADPH-oxidase gene restricted diversity in humans and evidencefor differential long-term purifying selection on transmembrane andcytosolic domains Hum Mutat 29(5)623ndash632

Taylor RM Burritt JB Baniulis D Foubert TR Lord CI Dinauer MCParkos CA Jesaitis AJ 2004 Site-specific inhibitors of NADPH oxi-dase activity and structural probes of flavocytochrome b character-ization of six monoclonal antibodies to the p22phox subunitJ Immunol 173(12)7349ndash7357

Voight BF Kudaravalli S Wen X Pritchard JK 2006 A map of recentpositive selection in the human genome PLoS Biol 4(3)e72

Watterson GA 1975 On the number of segregating sites in geneticalmodels without recombination Theor Popul Biol 7(2) 256ndash276

Williamson SH Hernandez R Fledel-Alon A Zhu L Nielsen RBustamante CD 2005 Simultaneous inference of selection and pop-ulation growth from patterns of variation in the human genomeProc Natl Acad Sci U S A 102(22)7882ndash7887

Yang Z 2007a Adaptive molecular evolution In Balding DJ Bishop MCannings C editors Handbook of statistical genetics Vol 1 3rd edSusex (United Kingdom) John Wiley amp Sons p 377ndash406

Yang Z 2007b PAML 4 phylogenetic analysis by maximum likelihoodMol Biol Evol 241586ndash1591

2167

Evolutionary Dynamics of Human NADPH Oxidase Genes doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

Thompson (1992) test which was implemented in the soft-ware Arlequin 30 (Excoffier et al 2007) Insertion-deletions(INDELs) were excluded from population genetics analysesTo assess intrapopulation diversity we used two statistics the per-site mean number of pairwise differences betweensequences (Tajima 1983) and w based on the number ofsegregating sites (S) (Watterson 1975) We measured pairwisebetween-populations diversity by using the FST statistics cal-culated using the software DnaSP (Rozas 2009)

Haplotypes and the recombination parameter were in-ferred using the PHASE software (Stephens and Scheet 2005)and diversity indexes calculations (tables 2 and 3) as well asneutrality statistics (table 4) were estimated using DnaSP soft-ware We applied two kinds of neutrality tests 1) tests basedon the allelic spectrum which is the distribution of polymor-phisms across different classes of frequencies namelyTajimarsquos DT (Tajima 1989) and FundashLirsquos DFL and FFL (Fu andLi 1993) and 2) tests based on comparisons between thenumber of polymorphisms in human populations and fixeddifferences with the chimpanzee (ie outgroup) namely theMcDonald and Kreitman (1991) test and the adaptedKolmogorovndashSmirnoff test by McDonald (1998) For thefirst set of tests we used as null hypotheses both the classicWrightndashFisher model of neutrality with a constant popula-tion size as well as the more realistic evolutionary scenario forhuman populations inferred by Laval et al (2010) In the caseof the scenario of Laval et al (2010) we ignored interconti-nental gene flow within the Old World because these raregene flow events likely does not affect the level of significanceof the neutrality tests given its very low inferred values(13 105) Null distributions used to test the significanceof the neutrality tests under these evolutionary scenarios weregenerated using coalescent simulations and a significancelevel of 005 (Hudson 2002) The KolmogorovndashSmirnoff testof neutrality adapted by McDonald (1998) was performedusing Slider software available at httpudeledu~mcdonaldaboutdnasliderhtml (last accessed July 16 2013) Weperformed coalescent simulations using ms software(Hudson 2002) Further methodological details are availableas supplementary material Supplementary Material onlineWe constructed the CYBA and NCF2 networks using allSNP variants and applying the Median joining algorithmand the maximum parsimony option calculations as imple-mented in the software Network 46 (Bandelt et al 1999)

Supplementary MaterialSupplementary tables S1ndashS5 and figure S1 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

SJC conceived the study ET-S MM FL RC AC CF LPand BS generated the dataanalyzed the sequences LB ACWCSM and MY provided tools for data generationand analyses ET-S MM FL WCSM and RR performedpopulation genetics analyses ET-S and MM prepared tablesand figures ET-S and SJC wrote the manuscript All the

authors contributed with discussions about the results Theauthors are grateful to Silvia Fuselli for discussions of ourresults to Fernando Levi Soares for technical assistance withthe figures and the Sequencing Group of the CoreGenotyping Facility (National Cancer Institute) for their tech-nical assistance This work was supported by the IntramuralResearch Program of the National Cancer Institute (NCI) andNational Institutes of Health (NIH) CF was supported by theUniversity of Bologna ET-S MM WCSM and FL weresupported by the Fogarty International Center and NationalCancer Institute (1R01TW007894) Conselho Nacional deDesenvolvimento Cientıfico e Tecnologico (Brazil Grant3075562012-3) Fundacao de Amparo a Pesquisa de MinasGerais (Brazil CBB PPM 0026512) and the Brazilian Ministryof Education (CAPES Agency grant PNPD 0256709-1)

ReferencesAyodo G Price AL Keinan A Ajwang A Otieno MF Orago AS

Patterson N Reich D 2007 Combining evidence of natural selectionwith association analysis increases power to detect malaria-resis-tance variants Am J Hum Genet 81(2)234ndash242

Bamshad M Wooding SP 2003 Signatures of natural selection in thehuman genome Nat Rev Genet 4(2)99ndash111

Bandelt H-J Forster P Rohl A 1999 Median-joining networks for infer-ring intraspecific phylogenies Mol Biol Evol 16(1)37ndash48

Barreiro LB Ben-Ali M Quach H et al (17 co-authors) 2009Evolutionary dynamics of human Toll-like receptors and their dif-ferent contributions to host defense PLoS Genet 5(7)e1000562

Barreiro LB Quintana-Murci L 2010 From evolutionary genetics tohuman immunology how selection shapes host defence genesNat Rev Genet 11(1)17ndash30

Barrett RD Schluter D 2008 Adaptation from standing genetic varia-tion Trends Ecol Evol 23(1)38ndash44

Bedard K Attar H Bonnefont J Jaquet V Borel C Plastre O Stasia MJAntonarakis SE Krause KH 2009 Three common polymorphisms inthe CYBA gene form a haplotype associated with decreased ROSgeneration Hum Mutat 30(7)1123ndash1133

Blekhman R Man O Herrmann L Boyko AR Indap A Kosiol CBustamante CD Teshima KM Przeworski M 2008 Natural selectionon genes that underlie human disease susceptibility Curr Biol18(12)883ndash889

Brandes RP Kreuzer J 2005 Vascular NADPH oxidases molecular mech-anisms of activation Cardiovasc Res 65(1)16ndash27

Buckley RH 2004 Pulmonary complications of primary immunodefi-ciencies Paediatr Respir Rev 5(Suppl A)S225ndashS233

Bustamante CD Fledel-Alon A Williamson S et al (13 co-authors)2005 Natural selection on protein-coding genes in the humangenome Nature 437(7062)1153ndash1157

Bustamante J Arias AA Vogt G et al (29 co-authors) 2011 GermlineCYBB mutations that selectively affect macrophages in kindredswith X-linked predisposition to tuberculous mycobacterial diseaseNat Immunol 12(3)213ndash221

Campbell MC Tishkoff SA 2008 African genetic diversity implicationsfor human demographic history modern human origins and com-plex disease mapping Annu Rev Genomics Hum Genet 9403ndash433

Chanock SJ Roesler J Zhan S Hopkins P Lee P Barrett DT ChristensenBL Curnutte JT Gorlach A 2000 Genomic structure of the humanp47-phox (NCF1) gene Blood Cells Mol Dis 26(1)37ndash46

Cunninghame Graham DS Morris DL Bhangale TR Criswell LASyvanen AC Ronnblom L Behrens TW Graham RR Vyse TJ2011 Association of NCF2 IKZF1 IRF8 IFIH1 and TYK2 with sys-temic lupus erythematosus PLoS Genet 7(10)e1002341

Excoffier L Laval G Schneider S 2007 Arlequin ver 30 an integratedsoftware package for population genetics data analysis EvolBioinform Online 147ndash50

2166

Tarazona-Santos et al doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

Excoffier L Ray N 2008 Surfing during population expansions promotesgenetic revolutions and structuration Trends Ecol Evol23(7)347ndash351

Ferrer-Admetlla A Bosch E Sikora M Marques-Bonet T Ramırez-SorianoA Muntasell A Navarro A Lazarus R Calafell F Bertranpetit J Casals F2008 Balancing selection is the main force shaping the evolution ofinnate immunity genes J Immunol 181(2)1315ndash1322

Fu YX Li WH 1993 Statistical tests of neutrality of mutations Genetics133(3)693ndash709

The 1000 Genomes Project Consortium 2010 A map of humangenome variation from population-scale sequencing Nature467(7319)1061ndash1073

The 1000 Genomes Project Consortium Abecasis GR Auton A BrooksLD DePristo MA Durbin RM Handsaker RE Kang HM Marth GTMcVean GA 2012 An integrated map of genetic variation from1092 human genomes Nature 491(7422)56ndash65

Guo SW Thompson EA 1992 Performing the exact test of Hardy-Weinberg proportion for multiple alleles Biometrics 48(2)361ndash372

Heyworth PG Cross AR Curnutte JT 2003 Chronic granulomatousdisease Curr Opin Immunol 15(5)578ndash584

Hudson RR 2002 Generating samples under a Wright-Fisher neutralmodel of genetic variation Bioinformatics 18(2)337ndash338

Jacob CO Eisenstein M Dinauer MC et al (21 co-authors) 2012 Lupus-associated causal mutation in neutrophil cytosolic factor 2 (NCF2)brings unique insights to the structure and function of NADPHoxidase Proc Natl Acad Sci U S A 109(2)E59ndashE67

Kimura M 1974 The neutral theory of molecular evolution CambridgeCambridge University Press

Kosiol C Vinar T da Fonseca RR Hubisz MJ Bustamante CD Nielsen RSiepel A 2008 Patterns of positive selection in six mammalian ge-nomes PLoS Genet 4(8)e1000144

Lacy F Kailasam MT OrsquoConnor DT Schmid-Schonbein GW Parmer RJ2000 Plasma hydrogen peroxide production in human essentialhypertension role of heredity gender and ethnicity Hypertension36(5)878ndash884

Laval G Patin E Barreiro LB Quintana-Murci L 2010 Formulating a his-torical and demographic model of recent human evolution based onresequencing data from noncoding regions PLoS One 5(4)e10284

Lewontin R 1972 The apportionment of human diversity Evol Biol 6391ndash398

Lindblad-Toh K Garber M Zuk O et al (88 co-authors) 2011 A high-resolution map of human evolutionary constraint using 29 mam-mals Nature 478(7370)476ndash482

Lodge R Diallo TO Descoteaux A 2006 Leishmania donovani lipopho-sphoglycan blocks NADPH oxidase assembly at the phagosomemembrane Cell Microbiol 8(12)1922ndash1931

Magalhaes WC Rodrigues MR Silva D Soares-Souza G Iannini MLCerqueira GC Faria-Campos AC Tarazona-Santos E 2012DIVERGENOME a bioinformatics platform to assist population ge-netics and genetic epidemiology studies Genet Epidemiol36(4)360ndash367

Marth JD Grewal PK 2008 Mammalian glycosylation in immunity NatRev Immunol 8(11)874ndash887

McDonald JH 1998 Improved tests for heterogeneity across a region ofDNA sequence in the ratio of polymorphism to divergence Mol BiolEvol 15377ndash384

McDonald JH Kreitman M 1991 Adaptive protein evolution at the Adhlocus in Drosophila Nature 351(6328)652ndash654

Nielsen R Bustamante C Clark AG et al (12 co-authors) 2005 A scanfor positively selected genes in the genomes of humans and chim-panzees PLoS Biol 3(6)e170

Olsson LM Lindqvist AK Kallberg H Padyukov L Burkhardt HAlfredsson L Klareskog L Holmdahl R 2007 A case-control studyof rheumatoid arthritis identifies an associated single nucleotidepolymorphism in the NCF4 gene supporting a role for the

NADPH-oxidase complex in autoimmunity Arthritis Res Ther9(5)R98

Packer BR Yeager M Burdett L et al (13 co-authors) 2006SNP500Cancer a public resource for sequence validation assay de-velopment and frequency analysis for genetic variation in candidategenes Nucleic Acids Res 34(Database issue)D617ndashD621

Peter BM Huerta-Sanchez E Nielsen R 2012 Distinguishing betweenselective sweeps from standing variation and from a de novo mu-tation PLoS Genet 8(10)e1003011

Przeworski M Coop G Wall JD 2005 The signature of positive selectionon standing genetic variation Evolution 59(11)2312ndash2323

Ptak SE Przeworski M 2002 Evidence for population growth in humansis confounded by fine-scale population structure Trends Genet18(11)559ndash563

Ramensky V Bork P Sunyaev S 2002 Human non-synonymous SNPsserver and survey Nucleic Acids Res 30(17)3894ndash3900

Rioux JD Xavier RJ Taylor KD et al (24 co-authors) 2007 Genome-wideassociation study identifies new susceptibility loci for Crohn diseaseand implicates autophagy in disease pathogenesis Nat Genet39(5)596ndash604

Roberts RL Hollis-Moffatt JE Gearry RB Kennedy MA Barclay MLMerriman TR 2008 Confirmation of association of IRGM andNCF4 with ileal Crohnrsquos disease in a population-based cohortGenes Immun 9(6)561ndash565

Rozas J 2009 DNA sequence polymorphism analysis using DnaSPMethods Mol Biol 537337ndash350

San Jose G Fortuno A Beloqui O Dıez J Zalba G 2008 NADPH oxidaseCYBA polymorphisms oxidative stress and cardiovascular diseasesClin Sci (Lond) 114(3)173ndash182

Santiago HC Gonzalez Lombana CZ Macedo JP et al (12 co-authors)2012 NADPH phagocyte oxidase knockout mice controlTrypanosoma cruzi proliferation but develop circulatory collapseand succumb to infection PLoS Negl Trop Dis 6(2)e1492

Stephens M Scheet P 2005 Accounting for decay of linkage disequilib-rium in haplotype inference and missing-data imputation Am JHum Genet 76(3)449ndash462

Sumimoto H Miyano K Takeya R 2005 Molecular composition andregulation of the Nox family NAD(P)H oxidases Biochem BiophysRes Commun 338(1)677ndash686

Tajima F 1983 Evolutionary relationship of DNA sequences in finitepopulations Genetics 105(2)437ndash460

Tajima F 1989 Statistical method for testing the neutral mutationhypothesis by DNA polymorphism Genetics 123(3)585ndash595

Tarazona-Santos E Bernig T Burdett L Magalhaes WC Fabbri C Liao JRedondo RA Welch R Yeager M Chanock SJ 2008 CYBB anNADPH-oxidase gene restricted diversity in humans and evidencefor differential long-term purifying selection on transmembrane andcytosolic domains Hum Mutat 29(5)623ndash632

Taylor RM Burritt JB Baniulis D Foubert TR Lord CI Dinauer MCParkos CA Jesaitis AJ 2004 Site-specific inhibitors of NADPH oxi-dase activity and structural probes of flavocytochrome b character-ization of six monoclonal antibodies to the p22phox subunitJ Immunol 173(12)7349ndash7357

Voight BF Kudaravalli S Wen X Pritchard JK 2006 A map of recentpositive selection in the human genome PLoS Biol 4(3)e72

Watterson GA 1975 On the number of segregating sites in geneticalmodels without recombination Theor Popul Biol 7(2) 256ndash276

Williamson SH Hernandez R Fledel-Alon A Zhu L Nielsen RBustamante CD 2005 Simultaneous inference of selection and pop-ulation growth from patterns of variation in the human genomeProc Natl Acad Sci U S A 102(22)7882ndash7887

Yang Z 2007a Adaptive molecular evolution In Balding DJ Bishop MCannings C editors Handbook of statistical genetics Vol 1 3rd edSusex (United Kingdom) John Wiley amp Sons p 377ndash406

Yang Z 2007b PAML 4 phylogenetic analysis by maximum likelihoodMol Biol Evol 241586ndash1591

2167

Evolutionary Dynamics of Human NADPH Oxidase Genes doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from

Excoffier L Ray N 2008 Surfing during population expansions promotesgenetic revolutions and structuration Trends Ecol Evol23(7)347ndash351

Ferrer-Admetlla A Bosch E Sikora M Marques-Bonet T Ramırez-SorianoA Muntasell A Navarro A Lazarus R Calafell F Bertranpetit J Casals F2008 Balancing selection is the main force shaping the evolution ofinnate immunity genes J Immunol 181(2)1315ndash1322

Fu YX Li WH 1993 Statistical tests of neutrality of mutations Genetics133(3)693ndash709

The 1000 Genomes Project Consortium 2010 A map of humangenome variation from population-scale sequencing Nature467(7319)1061ndash1073

The 1000 Genomes Project Consortium Abecasis GR Auton A BrooksLD DePristo MA Durbin RM Handsaker RE Kang HM Marth GTMcVean GA 2012 An integrated map of genetic variation from1092 human genomes Nature 491(7422)56ndash65

Guo SW Thompson EA 1992 Performing the exact test of Hardy-Weinberg proportion for multiple alleles Biometrics 48(2)361ndash372

Heyworth PG Cross AR Curnutte JT 2003 Chronic granulomatousdisease Curr Opin Immunol 15(5)578ndash584

Hudson RR 2002 Generating samples under a Wright-Fisher neutralmodel of genetic variation Bioinformatics 18(2)337ndash338

Jacob CO Eisenstein M Dinauer MC et al (21 co-authors) 2012 Lupus-associated causal mutation in neutrophil cytosolic factor 2 (NCF2)brings unique insights to the structure and function of NADPHoxidase Proc Natl Acad Sci U S A 109(2)E59ndashE67

Kimura M 1974 The neutral theory of molecular evolution CambridgeCambridge University Press

Kosiol C Vinar T da Fonseca RR Hubisz MJ Bustamante CD Nielsen RSiepel A 2008 Patterns of positive selection in six mammalian ge-nomes PLoS Genet 4(8)e1000144

Lacy F Kailasam MT OrsquoConnor DT Schmid-Schonbein GW Parmer RJ2000 Plasma hydrogen peroxide production in human essentialhypertension role of heredity gender and ethnicity Hypertension36(5)878ndash884

Laval G Patin E Barreiro LB Quintana-Murci L 2010 Formulating a his-torical and demographic model of recent human evolution based onresequencing data from noncoding regions PLoS One 5(4)e10284

Lewontin R 1972 The apportionment of human diversity Evol Biol 6391ndash398

Lindblad-Toh K Garber M Zuk O et al (88 co-authors) 2011 A high-resolution map of human evolutionary constraint using 29 mam-mals Nature 478(7370)476ndash482

Lodge R Diallo TO Descoteaux A 2006 Leishmania donovani lipopho-sphoglycan blocks NADPH oxidase assembly at the phagosomemembrane Cell Microbiol 8(12)1922ndash1931

Magalhaes WC Rodrigues MR Silva D Soares-Souza G Iannini MLCerqueira GC Faria-Campos AC Tarazona-Santos E 2012DIVERGENOME a bioinformatics platform to assist population ge-netics and genetic epidemiology studies Genet Epidemiol36(4)360ndash367

Marth JD Grewal PK 2008 Mammalian glycosylation in immunity NatRev Immunol 8(11)874ndash887

McDonald JH 1998 Improved tests for heterogeneity across a region ofDNA sequence in the ratio of polymorphism to divergence Mol BiolEvol 15377ndash384

McDonald JH Kreitman M 1991 Adaptive protein evolution at the Adhlocus in Drosophila Nature 351(6328)652ndash654

Nielsen R Bustamante C Clark AG et al (12 co-authors) 2005 A scanfor positively selected genes in the genomes of humans and chim-panzees PLoS Biol 3(6)e170

Olsson LM Lindqvist AK Kallberg H Padyukov L Burkhardt HAlfredsson L Klareskog L Holmdahl R 2007 A case-control studyof rheumatoid arthritis identifies an associated single nucleotidepolymorphism in the NCF4 gene supporting a role for the

NADPH-oxidase complex in autoimmunity Arthritis Res Ther9(5)R98

Packer BR Yeager M Burdett L et al (13 co-authors) 2006SNP500Cancer a public resource for sequence validation assay de-velopment and frequency analysis for genetic variation in candidategenes Nucleic Acids Res 34(Database issue)D617ndashD621

Peter BM Huerta-Sanchez E Nielsen R 2012 Distinguishing betweenselective sweeps from standing variation and from a de novo mu-tation PLoS Genet 8(10)e1003011

Przeworski M Coop G Wall JD 2005 The signature of positive selectionon standing genetic variation Evolution 59(11)2312ndash2323

Ptak SE Przeworski M 2002 Evidence for population growth in humansis confounded by fine-scale population structure Trends Genet18(11)559ndash563

Ramensky V Bork P Sunyaev S 2002 Human non-synonymous SNPsserver and survey Nucleic Acids Res 30(17)3894ndash3900

Rioux JD Xavier RJ Taylor KD et al (24 co-authors) 2007 Genome-wideassociation study identifies new susceptibility loci for Crohn diseaseand implicates autophagy in disease pathogenesis Nat Genet39(5)596ndash604

Roberts RL Hollis-Moffatt JE Gearry RB Kennedy MA Barclay MLMerriman TR 2008 Confirmation of association of IRGM andNCF4 with ileal Crohnrsquos disease in a population-based cohortGenes Immun 9(6)561ndash565

Rozas J 2009 DNA sequence polymorphism analysis using DnaSPMethods Mol Biol 537337ndash350

San Jose G Fortuno A Beloqui O Dıez J Zalba G 2008 NADPH oxidaseCYBA polymorphisms oxidative stress and cardiovascular diseasesClin Sci (Lond) 114(3)173ndash182

Santiago HC Gonzalez Lombana CZ Macedo JP et al (12 co-authors)2012 NADPH phagocyte oxidase knockout mice controlTrypanosoma cruzi proliferation but develop circulatory collapseand succumb to infection PLoS Negl Trop Dis 6(2)e1492

Stephens M Scheet P 2005 Accounting for decay of linkage disequilib-rium in haplotype inference and missing-data imputation Am JHum Genet 76(3)449ndash462

Sumimoto H Miyano K Takeya R 2005 Molecular composition andregulation of the Nox family NAD(P)H oxidases Biochem BiophysRes Commun 338(1)677ndash686

Tajima F 1983 Evolutionary relationship of DNA sequences in finitepopulations Genetics 105(2)437ndash460

Tajima F 1989 Statistical method for testing the neutral mutationhypothesis by DNA polymorphism Genetics 123(3)585ndash595

Tarazona-Santos E Bernig T Burdett L Magalhaes WC Fabbri C Liao JRedondo RA Welch R Yeager M Chanock SJ 2008 CYBB anNADPH-oxidase gene restricted diversity in humans and evidencefor differential long-term purifying selection on transmembrane andcytosolic domains Hum Mutat 29(5)623ndash632

Taylor RM Burritt JB Baniulis D Foubert TR Lord CI Dinauer MCParkos CA Jesaitis AJ 2004 Site-specific inhibitors of NADPH oxi-dase activity and structural probes of flavocytochrome b character-ization of six monoclonal antibodies to the p22phox subunitJ Immunol 173(12)7349ndash7357

Voight BF Kudaravalli S Wen X Pritchard JK 2006 A map of recentpositive selection in the human genome PLoS Biol 4(3)e72

Watterson GA 1975 On the number of segregating sites in geneticalmodels without recombination Theor Popul Biol 7(2) 256ndash276

Williamson SH Hernandez R Fledel-Alon A Zhu L Nielsen RBustamante CD 2005 Simultaneous inference of selection and pop-ulation growth from patterns of variation in the human genomeProc Natl Acad Sci U S A 102(22)7882ndash7887

Yang Z 2007a Adaptive molecular evolution In Balding DJ Bishop MCannings C editors Handbook of statistical genetics Vol 1 3rd edSusex (United Kingdom) John Wiley amp Sons p 377ndash406

Yang Z 2007b PAML 4 phylogenetic analysis by maximum likelihoodMol Biol Evol 241586ndash1591

2167

Evolutionary Dynamics of Human NADPH Oxidase Genes doi101093molbevmst119 MBE by guest on A

ugust 13 2016httpm

beoxfordjournalsorgD

ownloaded from