supplemetary material - university of arizonaflmendez/papers/mendez_2012a_suppl.pdfsupplemetary...

39
SUPPLEMETARY MATERIAL Supplementary Methods DNA sequencing DNA samples were PCR amplified and Sanger sequenced on an ABI 3730XL DNA analyzer. Primers were designed using the genome reference sequence in Oligo Primer Analysis Software. Amplification primers are listed in table S2. Sequences were finished using the Phred/Phrap/Consed/PolyPhred suit of programs. The ancestral state in humans was inferred using chimpanzee, bonobo and gorilla as outgroups (table S3). In addition to the 5 individuals listed in the Materials and Methods in the Melanesian panel, 11 individuals from Melanesian populations (6 from Papua New Guinea, 1 from New Britain, and 4 from Bougainville) and 2 individuals from sub-Saharan Africa (1 San and 1 from Ghana) were re-sequenced for the extended region spanning positions 111820038-111842221. Four individuals from Bougainville and two Papua New Guineans are part of the HGDP panel, three Papua New Guineans and the San individual are shared with the diversity panel, and the remaining individuals were previously included in studies of Y chromosome diversity (Scheinfeldt et al. 2006; Wilder and Hammer 2007). Correcting Allele Frequencies To correct for uneven sampling of SNPs on the backgrounds of the deep and shallow lineages associated with the ancestral state at 111829579 (B 1 and B 2 ) and the derived states at 111839753 (A 1 and A 2 ), we carried out the following computation for the frequency of A 1 and its variance. The frequency of A 1 can be calculated using the law of total probability.

Upload: others

Post on 19-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SUPPLEMETARY MATERIAL - University of Arizonaflmendez/papers/Mendez_2012a_suppl.pdfSUPPLEMETARY MATERIAL Supplementary Methods DNA sequencing DNA samples were PCR amplified and Sanger

SUPPLEMETARY MATERIAL

Supplementary Methods

DNA sequencing

DNA samples were PCR amplified and Sanger sequenced on an ABI 3730XL DNA analyzer.

Primers were designed using the genome reference sequence in Oligo Primer Analysis Software.

Amplification primers are listed in table S2. Sequences were finished using the

Phred/Phrap/Consed/PolyPhred suit of programs. The ancestral state in humans was inferred

using chimpanzee, bonobo and gorilla as outgroups (table S3).

In addition to the 5 individuals listed in the Materials and Methods in the Melanesian

panel, 11 individuals from Melanesian populations (6 from Papua New Guinea, 1 from New

Britain, and 4 from Bougainville) and 2 individuals from sub-Saharan Africa (1 San and 1 from

Ghana) were re-sequenced for the extended region spanning positions 111820038-111842221.

Four individuals from Bougainville and two Papua New Guineans are part of the HGDP panel,

three Papua New Guineans and the San individual are shared with the diversity panel, and the

remaining individuals were previously included in studies of Y chromosome diversity

(Scheinfeldt et al. 2006; Wilder and Hammer 2007).

Correcting Allele Frequencies

To correct for uneven sampling of SNPs on the backgrounds of the deep and shallow lineages

associated with the ancestral state at 111829579 (B1 and B2) and the derived states at 111839753

(A1 and A2), we carried out the following computation for the frequency of A1 and its variance.

The frequency of A1 can be calculated using the law of total probability.

Page 2: SUPPLEMETARY MATERIAL - University of Arizonaflmendez/papers/Mendez_2012a_suppl.pdfSUPPLEMETARY MATERIAL Supplementary Methods DNA sequencing DNA samples were PCR amplified and Sanger

From the conditional variance formula we obtain

Estimates for frequencies and variances were obtained by using sample proportions from the

data.

Coalescent simulations

Intra-allelic variation in the two newly described Melanesian lineages (deep and shallow) was

studied implementing coalescent simulations for the sequence spanning 111822001-111842221

using the program ms (Hudson 2002). Mutation rates were estimated using polymorphisms in the

human lineage and assuming a divergence time of 6 Mya (i.e., 240,000 generations ago) between

human and chimpanzee sequences. The parameters for the population size correspond to the

number of chromosomes in the population with a given allele. Five classes of demographic

model were simulated with 50,000 simulations for each of several parameter values: a) constant

population size, b) exponential growth from constant population size, c) population crash

followed by exponential growth, d) exponential growth from a single chromosome, and e)

initially as in b), but the population of chromosomes has reached a stable size (table S7a, fig.

S5). For each simulation with a frequency spectrum matching the empirical data, we kept the

coalescent time for the sample. The collection of coalescent times was used as a distribution of

estimates for the time to the most recent common ancestor (TMRCA) to calculate medians and

confidence intervals.

Page 3: SUPPLEMETARY MATERIAL - University of Arizonaflmendez/papers/Mendez_2012a_suppl.pdfSUPPLEMETARY MATERIAL Supplementary Methods DNA sequencing DNA samples were PCR amplified and Sanger

Phase, Network and Recombination Rate

We inferred phase using the program PHASE v2.1 in a run of 500,000 steps following 50,000

steps of burn-in. Recombination rates were inferred using the same program with a run of

100,000 steps following 10,000 steps of burn-in. The data were phased in sets of amplicons and

with all amplicons together. Phased haplotypes in amplicon set 2 were trimmed after site

111841590 to reduce the effects of recombination on the fifth amplicon. Recombination rates

were estimated using Yoruban trio genotypes from HapMap phase II. SNPs that resulted in

Mendelian violations were discarded.

Divergence time

Under the infinite sites mutation model, the number of mutations in a branch of a genealogy

follows a Poisson distribution. The mean number of mutations is the product of the branch length

and the mutation rate. Therefore, given the number of uniformly ascertained mutations, the

number of mutations in each branch follows a multinomial distribution with parameters

proportional to the branch lengths. Maximum likelihood estimates for the branch lengths and

confidence intervals for individual relative branch lengths can be obtained from the distribution

of mutations in the genealogy (Mendez et al. 2011). Estimates of the relative branch lengths can

be scaled if a calibration point is known. In this work the calibration value was the divergence

time with the chimpanzee sequence, which was assumed to be 6 Mya.

Two different modeling schemes were used: 1) only mutations leading to a specific

lineage were considered, and the remaining lineages were used to divide those mutations into

temporally distinct groups; 2) the TMRCA of two lineages was evaluated by considering the

three branches (those of each lineage and the one spanning from the ancestor with chimpanzee to

Page 4: SUPPLEMETARY MATERIAL - University of Arizonaflmendez/papers/Mendez_2012a_suppl.pdfSUPPLEMETARY MATERIAL Supplementary Methods DNA sequencing DNA samples were PCR amplified and Sanger

their common ancestor) and constraining both lineages to have the same branch length. It should

be noted that the parameter a associated with the fraction of the mutations that are segregating is

different in the two schemes. In the first scheme, a is the ratio of the divergence time of human

lineages to the time of the human-chimp divergence. In the second scheme, measured in terms

of the human-chimp divergence time, the total tree length is 1+a (i.e., the second human lineage

adds an extra amount, a, to the total tree length). The total tree length associated with the two

human lineages is 2a (i.e., a for each individual). Thus, the fraction of the total tree length

associated with the human lineages is 2a/(1+a).

Supplementary Results

Analysis of the fourth and fifth amplicons showed that while the “deep” lineage still persists in

the 3’ section of the gene, the relationship among haplotypes in this region is somewhat more

complex than in the 5’ end. We focused our analysis in the main text on the 5’ region, which

encompasses the 1st to 3rd exons. Here we provide further analysis and discussion of the 3’

region of the OAS1 gene, as well as characterize putative recombinant and gene-converted

chromosomes.

Levels and Patterns of Diversity in the 3’ region of OAS1

High levels of polymorphism in Papuans extend to the region encompassing exons 4-6,

especially in the fourth amplicon (table 1, table S8). A median-joining network of phased

haplotypes for this sequence, trimmed to reduce the effect of recombination, defines two groups

of haplotypes. One group has haplotypes from all populations while the second group has

haplotypes only from San, Mandenka, and Papuans (fig. S2). Of the two Papuan haplotypes

Page 5: SUPPLEMETARY MATERIAL - University of Arizonaflmendez/papers/Mendez_2012a_suppl.pdfSUPPLEMETARY MATERIAL Supplementary Methods DNA sequencing DNA samples were PCR amplified and Sanger

observed in the second group, one corresponds to the deep lineage haplotype in the region

encompassing exons 1-3. The other Papuan haplotype shows affinities with a haplotype observed

in the San. These haplotypes are referred to as Papuan and African “shallow” lineages,

respectively.

To better characterize the relationship between the Papuan and African shallow lineages

and to assess the intra-allelic variation in the Papuan shallow lineage, we also sequenced the

extended region in five homozygous Papuans and two (i.e., one homozygote and one

heterozygote) Africans carrying the shallow lineage (table S4, fig. 1). The estimated divergence

time between the Papuan and African shallow lineages is 240 kya (65 – 613 kya, 95% CI). The

Papuan shallow lineage is restricted to populations of Melanesian ancestry (fig. S3a). While the

African shallow lineage is found at very low frequency in several sub-Saharan African

populations (fig. S3b, table S5b), it is found at moderate frequency only in the KhoeSan and

Mbuti Pygmies.

Analysis of the level of intra-allelic variation using data from the extended re-sequencing

of the Melanesian panel, and consideration of a variety of possible population histories for the

ancestry of these chromosomes, provide an estimated TMRCA for the deep lineage of 24 kya (8

– 66 kya, 95% CI). With the same set of demographic models used for the deep lineage, the

estimate of the TMRCA in Papuans for the shallow lineage is 19 kya (7 – 58 kya, 95% CI) (table

S7b). These two estimates show a broad overlap.

In sum, we note that toward the 3’ end of the gene Papuans still maintain high genetic

diversity, which is due in part to the presence of a second lineage that is not present in Eurasians.

Because its estimated TMRCA is similar to that of the deep lineage, the possibility arises that

both lineages diversified in Oceania as part of the same demographic process.

Page 6: SUPPLEMETARY MATERIAL - University of Arizonaflmendez/papers/Mendez_2012a_suppl.pdfSUPPLEMETARY MATERIAL Supplementary Methods DNA sequencing DNA samples were PCR amplified and Sanger

Analysis of high density SNP data

Using the Illumina 650Y genotype data of the individual HGDP00555, known to be homozygous

for the deep lineage (table S4), it was possible to extract a haplotype spanning between positions

111820114 and 111906360 that is exclusive of the deep lineage. The search of this haplotype

against phased genotypes of samples in the Human Genome Diversity Project (Li et al. 2008;

Pickrell et al. 2009) retrieved 5 chromosomes from Oceania, together with two chromosomes

from Pakistan. The inferred haplotype for the ancient Denisova sequence is also consistent with

this deep haplotype with the exception of the SNP at position 111820114. The two Pakistani

individuals are heterozygous for the SNP at position 111828579 and derived at 111831807, and

the retrieved haplotypes for the two Pakistanis HGDP00052 and HGDP00078 agree with a deep

lineage haplotype from Bougainville in HGDP00979 between positions 111757914 and

111931486.

Recombination and gene conversion

One individual from Papua New Guinea and one site were excluded from the first network due to

recombination involving the deep lineage and a suspected gene conversion, respectively.

Extended re-sequencing was performed on the excluded individual, a second individual from

New Britain also carrying a recombinant haplotype for the deep lineage, and four individuals

with a suspected independent event of gene conversion involving the same candidate site (table

S4). The two recombinant chromosomes were produced by independent events: the two

recombinant chromosomes have a different breakpoint for recombination (see position

111832263 in table S4) and the chromosomes with which they recombined are different (see

Page 7: SUPPLEMETARY MATERIAL - University of Arizonaflmendez/papers/Mendez_2012a_suppl.pdfSUPPLEMETARY MATERIAL Supplementary Methods DNA sequencing DNA samples were PCR amplified and Sanger

position 111833253 in table S3). The four individuals with the suspected gene converted event

(HGDP00663, HGDP00788, HGDP00789 and HGDP00824) agree with the deep lineage at the

site excluded in the network of amplicon set 1 and at a neighboring in-del (table S4). Outside

those positions these chromosomes agree with non-converted chromosomes (as indicated by the

presence of the derived state at positions 111826545 and 111828555).

Functional variation

Non-synonymous mutations in the coding sequence for positions prior to 111839889 (core

sequence) affect the protein sequence in all known functional splicing variants of the gene OAS1.

We have analyzed all amino acid changes in the different lineages since the most common

ancestor with chimpanzee using Polyphen 2 (http://genetics.bwh.harvard.edu/pph2/index.shtml)

in the order on which they are inferred to have occurred. Whenever multiple single amino acid

mutations connect two known sequences, we have considered all the possible paths. The

functional effects of frameshifts and polymorphic change in splicing were excluded, because

they involve simultaneous changes in several amino acids. Likewise, recombinants changing

multiple amino acids simultaneously were excluded from our analysis.

All humans amino acid sequences share two mutations in the core sequence since the

MRCA with chimpanzee: D127G and D166N. Both mutations are predicted as benign. Another

mutation, in position D350N, affecting only one splicing variant, is also predicted as benign.

With the exception of the deep lineage, modern human sequences share the mutation Y179D,

also predicted as benign in all splicing variants.

The mutation G111839925A in the lineage shared by HGDP01029, JR020 and JR354

results in an A359T mutation, predicted as benign, in the protein sequence the p42 splicing

Page 8: SUPPLEMETARY MATERIAL - University of Arizonaflmendez/papers/Mendez_2012a_suppl.pdfSUPPLEMETARY MATERIAL Supplementary Methods DNA sequencing DNA samples were PCR amplified and Sanger

variant only (fig. 1). G to A mutations at rs1131476 and rs1051042 (position 111841599 and

111841792, respectively) result in A352T and R361T for the p46 transcript. This amino acid

sequence is observed only in people of African descent and in the Melanesian shallow lineage.

The deep lineage and Denisova have evolved from the ancestral sequence of all humans by the

inclusion of three mutations in the core sequence: R104G, P129R and E183D, with the deep

lineage (but not Denisova) including also a two-base-pair deletion at position 111841650 that

produces a frameshift in the sequence of a p46 isoform, shortening the amino acid sequence from

400 residues to 377 residues (molecular weight goes from 46.0 kDa to 43.4 kDa). Predictions for

the mutations R104G, P129R and E183D depend on the specific isoform, but the qualitative

impact is independent of the order of the mutations. The mutation R104G, which occurs in the

third alpha helix of the protein (Hartmann et al. 2003), has the largest effect. For the isoform

p42, none of the mutations is considered benign. A mutation that is not benign may only confer a

change in function, and not be deleterious. Considering the broad distribution of the deep lineage

it is likely that at least some of the functional mutations may be adaptive.

The isoform p48 originates from the (African) p46 by a transition G to A at position

111841576 (rs10774671). The derived A obliterates the splicing acceptor site, creating a weak

splicing acceptor site (sequence AAG) at 111841577 and enabling another weak splicing

acceptor site (sequence TAG) at position 111841674. The product of the transcript obtained from

the ancestral allele is usually called p46 (molecular weight 46.0 kDa), and the products of using

111841577 and 111841674 are called p52 (52.2 kDa) and p48 (molecular weight 47.5 kDa),

respectively. The last exons of these three transcripts have different reading frames, with p48

matching that of the deep lineage after the deletion; however the stop codon in the deep lineage

occurs at position 111841674. Three mutations occurring in the background of the derived state

Page 9: SUPPLEMETARY MATERIAL - University of Arizonaflmendez/papers/Mendez_2012a_suppl.pdfSUPPLEMETARY MATERIAL Supplementary Methods DNA sequencing DNA samples were PCR amplified and Sanger

for 111841576 affect the p42, p48 and p52 isoforms: G162S, T69N and R242Q. The

polymorphism G162S has been associated with susceptibility to Type1 diabetes (Tessier et al.

2006). It has been shown that the A allele at 111841576 increases the relative expression of p52

and p42, the latter probably by increasing the retention of the last intron (Lalonde et al. 2010).

The relative importance of the mutations described in the previous section is thus likely to be

related to the expression levels for the transcripts. There are two independent events of

recombination documented in this work between the deep lineage and chromosomes with the

derived state at 111841576. The effect of these recombination events on the amino acid sequence

is equivalent to performing the mutations R104G and P129R simultaneously.

Variance in TMRCA

In this section we explore more complicated models to explain the ancient TMRCA of the

second section within the extended sequence in Figure 1. To evaluate whether the 3.3 Mya

estimate for TMRCA is compatible with the relatively recent divergence time of ~0.3 Mya that

was inferred for the ancestors of Denisova and AMH (Reich et al. 2010), we first consider a

model of balancing selection acting on a SNP in the second section. This form of selection would

not be expected to prevent decay in LD around a single selected site. Performing an analysis of

LD as in the main text, we estimate a genetic map length of 0.0168 cM for the 6 kb of the second

section. A contour plot of the probability of maintenance of this haplotype in a model of

panmixia illustrates recombination would erode LD over the 6 kb (fig. S4). For example, with

the estimated values for recombination (0.0168 cM) and TMRCA (3.3 My) the probability of

observing the haplotype would be (2 x 10-10). When taking conservative estimates of 0.0084 cM

for the recombination rate and 1.7 My for the time over which recombination is detectable, the

Page 10: SUPPLEMETARY MATERIAL - University of Arizonaflmendez/papers/Mendez_2012a_suppl.pdfSUPPLEMETARY MATERIAL Supplementary Methods DNA sequencing DNA samples were PCR amplified and Sanger

probability of maintenance of the haplotype is still only 0.003 (fig. S4).

Page 11: SUPPLEMETARY MATERIAL - University of Arizonaflmendez/papers/Mendez_2012a_suppl.pdfSUPPLEMETARY MATERIAL Supplementary Methods DNA sequencing DNA samples were PCR amplified and Sanger

References

Hartmann, R., J. Justesen, S. N. Sarkar, G. C. Sen, and V. C. Yee. 2003. Crystal structure of the

2'-specific and double-stranded RNA-activated interferon-induced antiviral protein 2'-5'-

oligoadenylate synthetase. Mol Cell 12:1173-1185.

Hudson, R. R. 2002. Generating samples under a Wright-Fisher neutral model of genetic

variation. Bioinformatics 18:337-338.

Lalonde, E., K. C. Ha, Z. Wang, A. Bemmo, C. L. Kleinman, T. Kwan, T. Pastinen, and J.

Majewski. 2010. RNA sequencing reveals the role of splicing polymorphisms in

regulating human gene expression. Genome Res 21:545-554.

Li, J. Z., D. M. Absher, H. Tang, A. M. Southwick, A. M. Casto, S. Ramachandran, H. M. Cann,

G. S. Barsh, M. Feldman, L. L. Cavalli-Sforza, and R. M. Myers. 2008. Worldwide

human relationships inferred from genome-wide patterns of variation. Science 319:1100-

1104.

Mendez, F. L., T. M. Karafet, T. Krahn, H. Ostrer, H. Soodyall, and M. F. Hammer. 2011.

Increased resolution of Y chromosome haplogroup T defines relationships among

populations of the Near East, Europe, and Africa. Hum Biol 83:39-53.

Pickrell, J. K., G. Coop, J. Novembre, S. Kudaravalli, J. Z. Li, D. Absher, B. S. Srinivasan, G. S.

Barsh, R. M. Myers, M. W. Feldman, and J. K. Pritchard. 2009. Signals of recent positive

selection in a worldwide sample of human populations. Genome Res 19:826-837.

Reich, D., R. E. Green, M. Kircher, et al. (28 co-authors). 2010. Genetic history of an archaic

hominin group from Denisova Cave in Siberia. Nature 468:1053-1060.

Scheinfeldt, L., F. Friedlaender, J. Friedlaender, K. Latham, G. Koki, T. Karafet, M. Hammer,

and J. Lorenz. 2006. Unexpected NRY chromosome variation in Northern Island

Page 12: SUPPLEMETARY MATERIAL - University of Arizonaflmendez/papers/Mendez_2012a_suppl.pdfSUPPLEMETARY MATERIAL Supplementary Methods DNA sequencing DNA samples were PCR amplified and Sanger

Melanesia. Mol Biol Evol 23:1628-1641.

Tessier, M. C., H. Q. Qu, R. Frechette, F. Bacot, R. Grabs, S. P. Taback, M. L. Lawson, S. E.

Kirsch, T. J. Hudson, and C. Polychronakos. 2006. Type 1 diabetes and the OAS gene

cluster: association with splicing polymorphism or haplotype? J Med Genet 43:129-132.

Wilder, J. A., and M. F. Hammer. 2007. Extraordinary population structure among the Baining

of New Britain in J. S. Friedlaender, ed. Genes, Language, & Culture History in the

Southwest Pacific. Oxford University Press, New York.

Page 13: SUPPLEMETARY MATERIAL - University of Arizonaflmendez/papers/Mendez_2012a_suppl.pdfSUPPLEMETARY MATERIAL Supplementary Methods DNA sequencing DNA samples were PCR amplified and Sanger

Table S1. Coordinates of amplicons sequenced for the Diversity Panel

Amplicon beginning end length distance to the next amplicon

A1 111828677 111829834 1158 560

A2 111830395 111831525 1131 754

A3 111832280 111833723 1444 4536

A4 111838260 111840323 2064 647

A5 111840971 111842221 1251

Exons covereda beginning end length

1 111829122 111829407 286

2 111830724 111831012 289

3 111833239 111833423 185

4 111838697 111838926 230

5a 111839735 111839888 154

5b 111839735 111840214 480

6a 111841577 111842095 519

6b 111841578 111842095 518

6c 111841675 111842095 421a: alternative splicing variants are indicated with letters following the exon number

Page 14: SUPPLEMETARY MATERIAL - University of Arizonaflmendez/papers/Mendez_2012a_suppl.pdfSUPPLEMETARY MATERIAL Supplementary Methods DNA sequencing DNA samples were PCR amplified and Sanger

Table S2a. Primer information for amplicons re-sequenced in the Diversity Panel

Amplicon Upper Name Upper Seq Lower Name Lower Seq

OAS1A1 OAS1U28649 GAAAGGGAAAAAAGCATAGTATAATACC OAS1L29835 GAGGAAATTGGAACACAGAGTAGT

OAS1A2 OAS1U30370 GTAAGTGTGAACCACCCAGCATAAG OAS1L31526 TTTTTGAACACCTATTACTCATCAGAGC

OAS1A3 OAS1U32258 AAGACAAGAGGGAGAAGGCTGG OAS1L33724 GCGTGTGTGTATGTAGCATTGA

OAS1A4 OAS1U38235 GCATTTCTTAGGAACATTACAAGTC OAS1L40324 TTCACTATTTGGGCGACAGG

OAS1A5 OAS1U40899 TAAACAGCCTGCCTTGTCAC OAS1L42222 TATTCCCAGTGCCCAGAGC

Table S2b. Primer information for additional amplicons used in the extended resequencing

Amplicon Upper Name Upper Seq Lower Name Lower Seq

OAS1_20016 OAS1U20016 TGTGTAGATGCCCCATAGAGGA OAS1L21080 CAGAAACCAGAAAGGAAAACTGC

OAS1_21055 OAS1U21055 GAGCATCCAAGAAAACGAGTG OAS1L22562 ATCACAAGGCATCAACCAGG

OAS1_22181 OAS1U22181 GAGATTTCTTTCCCCACAGATTC OAS1L23724 ACCTCATCAAGCCAATGTCC

OAS1_23529 OAS1U23529 AAGTTGCTGAGGTCTGGTTTC OAS1L24717 CAAAAAGGTCTCGGTCTTCA

OAS1_24562 OAS1U24562 CTTTTGCTTGGCTCTTGTCC OAS1L26866 GTGGGGTGCTGTCTTTGC

OAS1_26789 OAS1U26789 TTTGCTTTATCATACTTGGC OAS1L27632 AACACTACTTTCACTACATCCC

OAS1_27602 OAS1U27602 AAAATGAAAAACAGCCTATCAAAAAG OAS1L29077 GCAAATCAGACACTCCCCTG

OAS1_29747 OAS1U29747 TAGGGGCTCACCATTTCTGC OAS1L30861 CTCTCTCTCTTTGACAGGCTTCC

OAS1_31409 OAS1U31409 CATTTGGACAGGAAGTGTAACC OAS1L32621 ATGGCTATCTATTGTTTCACCC

OAS1_33652 OAS1U33652 CACTGCTGTATCCCCAGAACT OAS1L35103 TGGCTATAAAACAATAATACTTCG

OAS1_34964 OAS1U34964 TTCTTTCTTGATGCTGTTCTCC OAS1L36082 CAGTGGTTTGAATGAGGACA

OAS1_35855 OAS1U35855 ATTTCTATTTCATATTTTTGTATCTGC OAS1L36264 TGGGGTGTGGCAAGGGT

OAS1_36075 OAS1U36075 CCTTTCCTGTCCTCATTCAAACC OAS1L37895 TATTGTGAAAATGACCATACTCCC

OAS1_37726 OAS1U37726 GAAGTCTGATAATGTAATGCCTC OAS1L38402 CGCTGGATTCTTATTGATGT

OAS1_40199 OAS1U40199 TAAATAGTCACAACAATCCCAT OAS1L40958 ACAACCCAAGTCACTCAGC

Table S2c. primer information for polymorphisms genotyped by RFLP

Positiona Upper Name Upper Seq Lower Name Lower Seq

111829579 OAS1dig1U GAGGGGTGGCTGAATGTG OAS1dig1L TCAAACAGTTACAGGGAGGAGAG

111831807 OAS1U31409 CATTTGGACAGGAAGTGTAACC OAS1L32621 ATGGCTATCTATTGTTTCACCC

111839753 OAS1U39625d TCTGAGTCCCAGTTCATCCC OAS1L39875d TCTCACCAGCAGAATCCAGGa All positions of polymorphisms are based on the 2006 built of the human genome (hg18)

Page 15: SUPPLEMETARY MATERIAL - University of Arizonaflmendez/papers/Mendez_2012a_suppl.pdfSUPPLEMETARY MATERIAL Supplementary Methods DNA sequencing DNA samples were PCR amplified and Sanger

Table S3. Polymorphism Table for five amplicons sequenced in the Diversity Panel a,b,c

111828752

111828852

111829011

111829013

111829473

111829492

111829550

111829579

111829639

111829676

111829698

111829736

111830504

111830510

111830749

111830853

111830929

111831400

111831465

111832329

111832673

111832690

111832718

111832895

111833010

111833037

111833044

111833131

111833232

111833233

111833253

111833304

111833318

111833482

111833544

111838514

111838673

111838767

111838813

111838936

111839151

111839308

111839658

111839753

111839925

111839966

111839984

111840068

111840237

111840318

111841149

111841283

111841305

111841318

111841363

111841391

111841413

111841458

111841576

111841585

111841592

111841599

111841620

111841650

111841792

111841825

111841877

111841948

111842046

111842071

111842091

111842150

111842166

Population sample name C T G T A GAGAGA C C C G G A T C C A C C G G G A T T G C C G C G G T G G G C - G G C G A G G G T C T A A G A C C C G G T G C G A G AT C G C C T G A G ABIA HGDP00451 . . . C R ------ . T . . . C . . . . . T . . . . . . . . . C . . . G . . . . . . . . . . . . . . . . T . . . T . T . . . . . A . C . . A . . A A . . .BIA HGDP00454 . . . C . ------ . T G . . C . . . . . T . . . . . . . . . . Y . R G . . . . . . . . . . . . . . . . T . . . T . T . . . R . A . C . . A . . A A . . .BIA HGDP00455 . . . C . ------ . T . . . C . . . . . T A . . . . . . . . . T . . G . . . . . . . . . . . . . . . . T . . . T . T . . . . . A . C . . A . . A A . . .BIA HGDP00457 . . . C . ------ . T S . . C . M . . . T R . . . . . . . . . . . R G . R . . . . . . . . . . . . . . T . . . T . T . . . R . A R C . Y A . Y A A . . .BIA HGDP00458 S . . C . ------ . T . . . C . . . . . T . . . . . . . Y . . . . . G . A . . X . . . . . . . . K . . T . . . T . T . . . A . A . C . . A . . A A . S .BIA HGDP00459 . . . C R ------ . T S . . C . . . . . T . . . . . . . Y . S . . . G . R . . . . . . . . . . . . . . T . . . T . T . . . R . A . C . . A . . A A . . .BIA HGDP00460 . . . C . ------ . T S . . C . . . . . T R . . . . . . . . . Y . R G . . . . . . . . . . . . . . . . T . . . T . T . . . R . A . C . . A . . A A . . .BIA HGDP00470 . . . C R ------ . T . . . C . . . . . T R . . . . . . . . S Y . . G . . . . . . . . . . . . . . . . T . . . T . T . . . . . A . C . . A . . A A . . .BIA HGDP00479 . . . C . ------ . T G . . C . . . . . T . . . . . . . Y . . . . R G . R . . . . . . . . . . . . . . T . . . T . T . . . A . A . C . . A . . A A . . .BIA HGDP00981 . . . C R ------ . T S . . C . M . . . T . . . . . . . . . S . . . G . R . . . . . . . . . . . . . . T . . . T . T . . . R . A . C . . A . . A A . . .BIA HGDP00985 . . . C R ------ . T S . . C . M . . . T . . . . . . . . . S . . . G . R . . . . . . . . . . . . . . T . . . T . T . . . R . A . C . . A . . A A . . .BIA HGDP01088 . . . C . ------ . T . . . C . . . . . T . . . . . . . . . S . . . G . R . . . . . . . . . . . . . . T . . . T . T . . . R . A . C . . A . . A A . . .BIA HGDP01089 . . . C . ------ . T . . . C . . . . . T R . . . . . . . . . Y . . G . R . . . . . . . . . . . . . . T . . . T . T . . . R . A . C . . A . . A A . . .BIA HGDP01091 . . . C . ------ . T . . . C . . . . . T A . . . . . . . . C . . . G . . . . . . . . . . . . . . . . T . . . T . T . . . . . A . C . . A . . A A . . .BIA HGDP01094 . . . C . ------ . T . . . C . . . . . T R . . . . . . . . . Y . . G . R . . . . . . . . . . . K . . T . . . T . T . . . R . A . C . . A . . A A . . .MAN HGDP00904 . . . C . ------ . T . . . C . . . . . T A . . . . . . . . . T . . G . . . . . . . . . . . . . . . . T . . . T . T . . . . . A . C . . A . . A A . . .MAN HGDP00905 . . . C R ------ . T . . . C . . . . . T . . . . . . . . . S . . . G . . . . . . R . . . . . . . S . W . . R Y Y Y . . . . . R . S . . R S . W R R . .MAN HGDP00906 . . A C . ------ . T . . . C . . . . . T A . . . . . . . . . T . . G . . . . . . . . . . . . . . . . T . . . T . T . . . . . A . C . . A . . A A . . .MAN HGDP00907 . . . C . ------ . T S . . C . M . . . T R . . . . . . Y . . Y . . G . R . . . . . . S . . . . . . . T . . . T . T . . . R . A . C . . A . . A A . . .MAN HGDP00908 . . . C . ------ . T . . . C . . . . . T A . . . . . . . . . T . . G . . . . . . . . . . . . . . . . T . . . T . T . . . . . A . C . . A . . A A . . .MAN HGDP00911 . . . C . ------ . T . . . C . . . . . T A . . . . . . . . . Y . R G . . . . . . . . . . . . . . . . T . . . T . T . . . . . A . C . . A . . A A . . .MAN HGDP00912 . . . C . ------ . T S . . C . . . . . T . . . . . Y . Y . . . . . G . A . . . . . . . . . . . K . . T . . . T . T . . . A . A . C . . A . . A A . . .MAN HGDP00913 . . . C . ------ . T S . . C . . . . . T R R . . . . . Y . . Y . . G . R . . . . . . . . . . . . . . T . . . T . T . . . R . A . C . . A . . A A . . .MAN HGDP00914 . . R C . ------ . T . . . C . . . . . T A . . . . . . . . . Y . R G . . . . . . . . . . . . . . . . T . . . T . T . . . R . A . C . . A . . A A . . .MAN HGDP00915 . . . C R ------ . T S . . C . . . . . T . . . . . . . Y . S . . . G . R . . . . . . . . . . . . . . T . . . T . T . . . R . A . C . . A . . A A . . .MAN HGDP01199 . . . C . ------ . T S . . C . . . . . T R . . . . . . . . . Y . R G . . . . . . . . . . . . . . . . T . R . T . T . . . R . A . C . . A . . A A . . .MAN HGDP01200 . . . C . ------ . T S . . C . . . . . T R . . . . . . Y . . Y . . G . R . . . . . . . . . . . . . . T . . . T . T . . . R . A . C . . A . . A A . . .MAN HGDP01202 . . . C . ------ . T . . . C . . . . . T A . . . . . . . . . T . . G . . . . . . . . . . . . . . . . T . . . T . T . . . . . A . C . . A . . A A . . .MAN HGDP01283 . . . C . ------ . T S . . C . . . . . T R . . . . . . . Y . Y . R G . . . . . . . . . . . . . . . . T . . . T . T . . . R . A . C . . A . . A A . . .MAN HGDP01284 . . . C . ------ . T G . . C . . . . . T . . . . . Y . Y . . . . R G . R . . . . . . . . . . . . . . T . . . T . T . . . A . A . C . . A . . A A . . .MAN HGDP01286 . . . C G ------ . T . . . C . . . . . T . . . . . . . . . C . . . G . . . . . . . . . . . . . . . . T . . . T . T . . . . . A . C . . A . . A A . . .SAN GM3043 . . . C . ------ . T . . . C . . . . . T . . . . . . . . . S . . . G . R . . . . . . . . . . . . . . T . . . T . T . . Y R . A . C . . A . . A A . . .SAN JR013 . . . C R ------ . T . . . C . . . . . T . . . . . . . . . S . . . G . R . . . . . . . . . . . . . . T . . . T . T . . . R . A . C . . A . . A A . . .SAN JR020 . . . C R ------ . T . . . C . . . . . T . . . . . . . . . S . . . G . . . . . . R . . . . . R . . . T . . R Y Y Y . . . . . R . S . . R S . W R R . .SAN JR054 . . . C . ------ . T G . . C . M . . . T . . . . . . . . . . . . . G . A . . . . . . . . . . . . . . T . . . T . T . . . A . A . C . . A . . A A . . .SAN JR077 . Y . C R ------ . T . . . C . . . . . T . . . . . . . . . S . . . G . . . . . . R Y . . . R . . . . T . . . T . T . . . . . A R C . Y A . Y A A . . .SAN JR301 . . . C . ------ . T S . . C . M . . . T . . . W . . . . . S . . . G . R . . X . . . . . . . . K . . T . . . T . T . . Y R . A . C . . A . . A A . . .SAN JR305 . . . C . ------ . T . . . C . . . . . T . . . . . . . . . . . . R G . . . . . . R Y . . . R . . . . T R . . T . T . . . R . A R C . Y A . Y A A . . .SAN JR321 . . . C . ------ . T S . . C . . . . . T . . . . . . . . . . . . . G . R . . . . R Y . . . R . . . . T . . . T . T . . . R . A R C . Y A . Y A A . . .SAN JR323 . . . C . ------ . T S . . C . . . . . T . . . . . . . . . . . . . G . A . . . . . . . . . . . . . . T . . . T . T . . . A . A . C . . A . . A A . . .SAN JR354 . . . C . ------ . T S . . C . . . . . T . . . . . . . . . . . . . G . R . . . . R . . . . . R . . . T . . R Y Y Y . . . R . R . S . . R S . W R R . .PNG NG004 . . . C . . . . . . . C . . . . . T . . . . . . . . . . . . . G . . . T . . A T . . . A . . . . . . . . . . . . . . . . A . C . . A . . A A . . .PNG NG006 . . . Y . XXXXXX . Y S R S M Y . . R S Y . . R . . . R . . . . . . K S . . Y . . A Y . . K R . . . . . . . . . . . . . . . . A . C XX . A . . A A . . .PNG NG013 . . . . . . . . . A C . C . . G G . . . . . . . A . . . . . . . C . . . . . A . . . T . . . . . . . . . . . . . . . . . A . C -- . A . . A A . . .PNG NG014 . . . C . . . . . . . C . . . . . T . . . . . . . . . . . . . G . . . T . . A T . . . A . . . . . . . . . . . . . . . . A . C . . A . . A A . . .PNG NG015 . . . C . XXXXXX . Y S . . C . . . . . T . . . . . . . . . . . . R G . . . Y . . R Y . . . R . . . . W . . . Y . Y . . . R . A . C . . A . . A A . . .PNG NG017 . . . C . . . . . . . C . . . . . T . . . . . . . . . . . . . G . . . T . . A T . . . A . . . . . . . . . . . . . . . . A . C . . A . . A A . . .PNG NG018 . . . C . ------ . T G . . C . M . . . T . . . . . . . . . . . . R G . R . . . . . . . . . . . . . . T . . . T . T . . . A . A . C . . A . . A A . . RPNG NG020 . . . Y . XXXXXX . Y S R S M Y M . R S Y . . . . . . R . . . . . . K S R . . . . R . . . K . . . . . W . . . Y . Y . . . R . A . C XX . A . . A A . . .PNG NG022 . . . C . ------ . T G . . C . M . . . T . . . . . . . . . . . . . G . R . Y . R R Y . . . R . . . . W . . . Y . Y . . . R . A . C . . A . . A A . . .PNG NG025 . . . C . XXXXXX . Y S . . C . M . . . T . . . . . . . . . . . . R G . . . Y . . R Y . . . R . . . . W . . . Y . Y . . . R . A . C . . A . . A A . . .PNG NG026 . . . Y . XXXXXX . Y S R S M Y . . R S Y . . . . . . R . . . . . R K S . . . . . R . . . K . . . . . W . . . Y . Y . . . R . A . C XX . A . . A A . . .PNG NG029 . . . C . XXXXXX . Y S . . C . . . . . T . . . . . . . . . . . . R G . . . Y . . R Y . . . R . . . . W . . . Y . Y . . . R . A . C . . A . . A A . . .PNG NG030 . . . . . . . . . A C . C . . G G . . . . . . . A . . . . . . . C . . . . . A . . . T . . . . . . . . . . . . . . . . . A . C -- . A . . A A . . .

Page 16: SUPPLEMETARY MATERIAL - University of Arizonaflmendez/papers/Mendez_2012a_suppl.pdfSUPPLEMETARY MATERIAL Supplementary Methods DNA sequencing DNA samples were PCR amplified and Sanger

Table S3. Polymorphism Table for five amplicons sequenced in the Diversity Panel a,b,c

111828752

111828852

111829011

111829013

111829473

111829492

111829550

111829579

111829639

111829676

111829698

111829736

111830504

111830510

111830749

111830853

111830929

111831400

111831465

111832329

111832673

111832690

111832718

111832895

111833010

111833037

111833044

111833131

111833232

111833233

111833253

111833304

111833318

111833482

111833544

111838514

111838673

111838767

111838813

111838936

111839151

111839308

111839658

111839753

111839925

111839966

111839984

111840068

111840237

111840318

111841149

111841283

111841305

111841318

111841363

111841391

111841413

111841458

111841576

111841585

111841592

111841599

111841620

111841650

111841792

111841825

111841877

111841948

111842046

111842071

111842091

111842150

111842166

Populationsample name C T G T A GAGAGA C C C G G A T C C A C C G G G A T T G C C G C G G T G G G C - G G C G A G G G T C T A A G A C C C G G T G C G A G AT C G C C T G A G APNG NG034 . . . Y . XXXXXX . Y S R S M Y M M R S Y . . . . . . . . . . . . . G . A . . . R . . . . . . . . . . T . . . T . T . . . A . A . C . . A . . A A . . .PNG NG051 . . . C . ------ . T G . . C . . . . . T . . . . . . . . . . . . R G . R . . . . . . . . . . . . . . T . . . T . T . R . A . A . C . . A . . A A . . .HAN HGDP00774 . . . C . ------ . T G . . C . . . . . T . . . . . . . . . . . . R G . . . . . . . . . . . . . . . . T . . . T . T . . . R . R . S . . R . . A A . . .HAN HGDP00775 . . . C . ------ . T G . . C . M . . . T . . . . . . . . . . . . R G . R . . . . . . . . . . . . . . T . . . T . T . . . A . A . C . . A . . A A . . .HAN HGDP00777 . . . C . ------ . T G . . C . A . . . T . . . . . . . . . . . . . G . . . . . . . . . . . . . . . . T . . . T . T . . . . . . . . . . . . . A A . . .HAN HGDP00778 . . . C . ------ . T G . . C . . . . . T . . . . . . . . . . . . R G . . . . . . . . . . . . . . . . T . . . T . T . . . R Y R . S . . R . . A A . . .HAN HGDP00780 . . . C . ------ . T G . . C . . . . . T . . . . . . . . . . . . A G . . K . . . . . . . . . . . . . T . . . T . T . . . A . A . C . . A . . A A . . .HAN HGDP00785 . . . C . ------ . T G . . C . . . . . T . . . . . . . . . . . . A G . . . . . . . . . . . . . . . . T . . . T . T . . . A . A . C . . A . . A A . . .HAN HGDP00786 . . . C . ------ . T G . . C . . . . . T . . . . . . . . . . . . . G . . . . . . . . . . . . . . . . T . . . T . T . . . . . . . . . . . . . A A . . .HAN HGDP00815 . . . C . ------ . T G . . C . M . . . T . . . . . . . . . . . R R G . R . . . . . . . . . . . . . . T . . . T . T . . . A . A . C . . A . . A A . . .HAN HGDP00819 . . . C . ------ . T G . . C . M . . . T . . . . . . . . . . . . A G . . . . . . . . . . . . . . . . T . . . T . T S . . A . A . C . . A . . A A . . .HAN HGDP00977 . . . C . ------ . T G . . C . . . . . T . . . . . . . . . . . . A G . . . . . . . . . . . . . . . Y T . . . T . T . . . A . A . C . . A . . A A . . .HAN HGDP01288 . . . C . ------ . T G . . C . . . . . T . . . . . . . . . . . . R G . . . . . . . . . . . . . . . . T . . . T . T . . . R . R . S . . R . . A A . . .HAN HGDP01290 . . . C . ------ . T G . . C . . . . . T . . . . Y . . . . . . . R G . . . . . . . . . . . . . . . . T . . . T . T . . . A Y A . C . . A . . A A . . .HAN HGDP01293 . . . C . ------ . T G . . C . M . . . T . . . . . . . . . . . . . G . A . . . . . . . . . . . . . . T . . . T . T . . . A . A . C . . A . . A A . . .HAN HGDP01294 . . . C . ------ . T G . . C . . . . . T . . . . . . . . . . . . A G . . . . . . . . . . . . . . . . T . . . T . T . . . A Y A . C . . A . . A A . . .HAN HGDP01295 . . . C . ------ . T G . . C . . . . . T . . . . . . . . . . . . R G . . . . . . . . . . . . . . . . T . . . T . T . . . R . R . S . . R . . A A . . .HAN HGDP01296 . . . C . ------ . T G . . C . . . . . T . . . . . . . . . . . . . G . . . . . . . . . . . . . . . . T . . . T . T . . . A . A . C . . A . . A A . . .BAS HGDP01357 . . . C . ------ . T G . . C . . . . . T . . . . . . . . . . . . R G . . . . . . . . . . . . . . . . T . . . T . T . . . R . R . S . . R . . A A . . .BAS HGDP01358 . . . C . ------ . T G . . C . M . . . T . . . . . . . . . . . . R G . R . . . . . . . . . . . . . . T . . . T . T . . . A . A . C . . A . . A A . . .BAS HGDP01359 . . . C . ------ . T G . . C . . . . . T . . . . . . . . . . . . . G . . . . . . . . . . . . . . . . T . . . T . T . . . . . . . . . . . . . A A . . .BAS HGDP01360 . . . C . ------ . T G . . C . . . . . T . . . . . . . . Y . . . R G . . . . . . . . . . . . . . . . T . . . T . T . . . R . R . S . . R . . A A . . .BAS HGDP01361 . . . C . ------ . T G . . C . . . . . T . . . . . . . . Y . . . R G . R . . . . . . . . . . . . . . T . . . T . T . . . A . A . C . . A . . A A . . .BAS HGDP01362 . . . C . ------ . T G . . C . . . . . T . . . . . . . . T . . . A G . . . . . . . . . . . . . . . . T . . . T . T . . . A . A . C . . A . . A A . . .BAS HGDP01364 . . . C . ------ . T G . . C . . . . . T . . . . . . . . Y . . . R G . . . . . . . . . . . . . . . . T . . . T . T . . . R . R . S . . R . . A A . . .BAS HGDP01370 . . . C . ------ . T G . . C . . . . . T . . . . . . . . Y . . . R G . R . . . . . . . . . . . . . . T . . . T . T . . . A . A . C . . A . . A A . . .BAS HGDP01371 . . . C . ------ . T G . . C . . . . . T . . . . . . . . Y . . . A G . . . . . . . . . . . . . . . . T . . . T . T . . . A . A . C . . A . . A A . . .BAS HGDP01372 . . . C . ------ M T S . . C . . . . . T . . . . . . . . . . . . R G . . . . . . . . . . . . . . . . T . . . T . T . . . R . R . S . . R . . A A . . .BAS HGDP01374 . . . C . ------ . T G . . C . . . . . T . . . . . . . . Y . . . A G . . . . . . . . . . . . . . . . T . . . T . T . . . A . A . C . . A . . A A . . .BAS HGDP01375 . . . C . ------ . T G . . C . . . . . T . . . . . . . . . . . . A G . . . . . . . . . . . . . . . . T . . . T . T . . . A . A . C . . A . . A A . . .BAS HGDP01376 . . . C . ------ . T G . . C . . . . . T . . . . . . . . Y . . . A G . . . . . . . . . . . . . . . . T . . . T . T . . . A . A . C . . A . . A A . . .BAS HGDP01377 . . . C . ------ . T G . . C . . . . . T . . . . . . . . . . . . R G . R . . . . . . . . . . . . . . T . . . T . T . . . A . A . C . . A . . A A . . .BAS HGDP01378 . . . C . ------ . T G . . C . . . . . T . . . . . . . . . . . . A G . . . . . . . . . . . . . . . . T . . . T . T . . . A . A . C . . A . . A A . . .BAS HGDP01379 . . . C . ------ . T G . . C . . . . . T . . . . . . . . Y . . . A G . . . . . . . . . M . . . . . . T . . . T . T . . . A . A . C . . A . . A A . . .

Chimpanzee C T G T A GAGAGA C C C G G A T C C A C C G G G A T T G C C G C G G T G G G C - A G C G A G G G T C T A A G A C C C G G T G C G A G AT C G C C T G A G ABonobo C T G T A ------ C C C G G A T C C A C C G G A A T T G C C G C G G T G G G C - G G C G A G G G T C T T A G A C C C G G T G C G A G AT C G C C T G A G AGorilla C T G T A GA---- C C C G G A T C C A C C G G G A T T G C C G C G G T G G G C - A G C G A G G G T C T A A G A C C C G G T G C G A G AT C G C C T G A G A

a: positions in the 2006 built of the human genome are indicated above the inferred ancestral state. Five boxes distinguish the sites from the five ampliconsb Mutations were color coded according to their function: yellow (non-synonymous), brown (synonymous), green (3'-UTR), orange (splicing acceptor polymorphism), red (frameshift deletion), gray (intronic and intergenic)c: Derived alleles are written explicitly. IUPAC ambiguity codes are used for heterozygous sites. 'X' indicates heterozygote in-del

Page 17: SUPPLEMETARY MATERIAL - University of Arizonaflmendez/papers/Mendez_2012a_suppl.pdfSUPPLEMETARY MATERIAL Supplementary Methods DNA sequencing DNA samples were PCR amplified and Sanger

Table S4. Sequence variation in an extended re-sequencing for a selected group of individualsa

111820114

111820638

111821546

111821722

111822114

111822139

111822327

111822609

111823810

111824538

111824541

111825497

111826028

111826031

111826032

111826038

111826039

111826302

111826367

111826545

111826925

111826987

111827306

111828457

111828555

111829013

111829030

111829491

111829579

111829639

111829676

111829698

111829736

111830082

111830163

111830354

111830389

111830504

111830510

111830749

111830853

111830929

111831400

111831807

111832000

Population Individual T C A --- C T G CAC T AAA ----- G T A AATAT T AATTATTTTT G C A A A C A G T A GAGAGA C C G G A A T C C T C C A C C T G

Deep lineage Vanuatu MF82 G T . . . G A --- . . . . A G ----- C ---------- . A . . . . . . . . . . . A C . . A . T C . . G G . G .PNG HGDP00555 G . . . . G A --- . . . . A G ----- C ---------- . A . . R . . . . . . . . A C . . A . T C . . G G . G .PNG NG013 G . . . . G A --- . . . . A G ----- C ---------- . A . . . . . . . . . . . A C . . A . T C . . G G . G .PNG NG030 G . . . . G A --- . . . . A G ----- C ---------- . A . . R . . . . . . . . A C . . A . T C . . G G . G .PNG NG062 G . . . . G A --- . . . . A G ----- C ---------- . A . . R . . . . . . . . A C . . A . T C . . G G . G .

Shallow lineageOceanian PNG HGDP00556 G T . TT- . . . . . . AAAAA . . . . . . . . . . . . . . C . . . . . . C . . T . . . . . . T . A

PNG HGDP00550 G T . TT- . . . . . . AAAAA . . . . . . . . . . . . . . C . . . . . . C . . T . . . . . . T . APNG NG04 G T . TT- . . . . . . AAAAA . . . . . . . . . . . . . . C . . . . . . C . . T . . . . . . T . APNG NG14 G T . TT- . . . . . . AAAAA . . . . . . . . . . . . . . C . . . . . . C . . T . . . . . . T . APNG NG17 G Y . TT- . . . . . . AAAAA . . . . . . . . . . . . . . C . . . . . . C . . T . . . . . . T . A

African Ghana Ghn32 n n n NNN n n n NNN n NNN NNNNN . . . . . . . . . T . . . . C G ------ T . . . C . . T . . . . . . T . ASan JR321 n n n NNN n n n NNN n NNN NNNNN . . . . . . R . R . . . . A C . ------ T S . . C C . T . . . . . . T . A

Recombinant New Britain UV005 G T . . . G A --- . . . . A G ----- C ---------- . A . . . . . . . . . . . A C . . A . T C . . G G . G .PNG NG034 K . R XXX Y K R XXX Y XXX . R W R XXXXX Y XXXXXXXXXX . M . . . . . R Y . XXXXXX Y S R S M M W Y Y Y M M R S Y K R

Gene Converted Bougainville HGDP00824 n n n NNN n n n NNN n NNN NNNNN . W R XXXXX Y XXXXXXXXXX R M R . . . R R Y . . . . R S M . W Y Y Y . . R S Y K RBougainville HGDP00663 n n n NNN n n n NNN n NNN NNNNN . . . . . . R . R . . . . R C . . . . . . C . . T . . . . . . T . ABougainville HGDP00788 n n n NNN n n n NNN n NNN NNNNN . . . . . . A . G . . . R A C . . . . . . C . . T . . . . . . T . ABougainville HGDP00789 n n n NNN n n n NNN n NNN NNNNN . . . . . . R . R . . . R R C . . . . . . C . . T . . . . . . T . AOutgroup Chimpanzee T C A T-- C T G CAC T AAA AA--- G T A AATAT T AATTATTTTT G C A A A C A G T A GAGAGA C C G G A A T C C T C C A C C T G

a: A single polymorphism in a CpG site at position 111838767 was inferred to have mutated independently in some human lineages and in chimpanzee

Page 18: SUPPLEMETARY MATERIAL - University of Arizonaflmendez/papers/Mendez_2012a_suppl.pdfSUPPLEMETARY MATERIAL Supplementary Methods DNA sequencing DNA samples were PCR amplified and Sanger

Table S4. Sequence variation in an extended re-sequencing for a selected group of individualsa

111832263

111833010

111833253

111833304

111833318

111833482

111834488

111834529

111835903

111836291

111836440

111836542

111836790

111836924

111836934

111837034

111837091

111837941

111838514

111838767

111838813

111838936

111839658

111839753

111840237

111840418

111840607

111840705

111841305

111841363

111841576

111841599

111841650

111841792

111841948

Population Individual AA G G T G G A A T G G T C T --- C C A C G G C G G A A C T C C G A AT C C

Deep lineage Vanuatu MF82 -- A . . C . . . . . A . T . . T . . . . A . T . . . . G . . . . -- . .PNG HGDP00555 -- A . . C . . . . . A . T . . T . . . . A . T . . . . G . . . . -- . .PNG NG013 -- A . . C . . . . . A . T . . T . . . . A . T . . . . G . . . . -- . .PNG NG030 -- A . . C . . . . . A . T . . T . . . . A . T . . . . G . . . . -- . .PNG NG062 -- A . . C . . . . . A . T . . T . . . . A . T . . . . G . . . . -- . .

Shallow lineageOceanian PNG HGDP00556 . . . G . . . . . . . . . G TT- . . . T . A T . A . . A . . . . . . . .

PNG HGDP00550 . . . G . . . . . . . . . G TT- . . . T . A T . A . . A . . . . . . . .PNG NG04 . . . G . . . . . . . . . G TT- . . . T . A T . A . . A . . . . . . . .PNG NG14 . . . G . . . . . . . . . G TT- . . . T . A T . A . . A . . . . . . . .PNG NG17 . . . G . . . . . . . . . G TT- . . . T . A T . A . . A . . . . . . . .

African Ghana Ghn32 . . . G . . . . . . . . . G TT- . . T . . A T . A T . . . T T . G . T TSan JR321 . . . G . R . R K . . Y . K TTX . M W . . R Y . R T . . . T T R R . Y Y

Recombinant New Britain UV005 XX R R K S . R . K . R Y Y . XXX Y . . . . R . K . W . . K Y Y R . XX . .PNG NG034 XX . . G . A . . G R . C . . TTT . A . . R . . . . T . . . T T A . . . .

Gene Converted Bougainville HGDP00824 XX R R K S . . . K . R Y Y . XXX Y . . . . R . K . W R . K Y Y R . XX . .Bougainville HGDP00663 . . R G . . R . K . . Y . K TTX . . . Y . R Y . R W R M . Y Y R . . . .Bougainville HGDP00788 . . A G . . R . G . . C . . TTT . . . . . . . . . T R . . T T A . . . .Bougainville HGDP00789 . . R G . . . . K . . Y . K TTX . . . Y . R Y . R W R M . Y Y R . . . .Outgroup Chimpanzee AA G G T G G A A T C G T C T --- C C A C A G C G G A A C T C C G A AT C C

a: A single polymorphism in a CpG site at position 111838767 was inferred to have mutated independently in some human lineages and in chimpanzee

Page 19: SUPPLEMETARY MATERIAL - University of Arizonaflmendez/papers/Mendez_2012a_suppl.pdfSUPPLEMETARY MATERIAL Supplementary Methods DNA sequencing DNA samples were PCR amplified and Sanger

Table S5a. Derived frequency of the deep lineagePopulation frequency standard deviation/sample size a

Oceaniac Australia (West) 0.056 0.038Melanesian (Nasioi) 0.063 0.035Vanuatu 0.287 0.078New Britain 0.531 0.052Micronesia 0.126 0.045Papuan 0.227 0.034Tonga 0.091 0.063Samoas 0.059 0.041Tahiti 0.058 0.032

South East Asiab Eastern Indonesia 0.167 0.112Western Indonesia 0.000 7Laos 0.000 3Malay 0.000 4Philippines 0.000 25Vietnam 0.000 17

East Asiab Han 0.000 20Miao 0.000 38Yao 0.000 46Taiwan 0.000 56

South Asiab India 0.000 183Pakistan 0.014 0.010Sri Lanka 0.014 0.010

Europeb Basque 0.000 20

North East Africab Ethiopia 0.000 25

Sub-Saharan Africab Baka (Pygmy) 0.000 13Biaka (Pygmy) 0.000 27Gambia 0.000 32Ghana 0.000 85Ivory Coast 0.000 22Kenya (Bantu) 0.000 62KhoeSan 0.000 38Yoruba 0.000 117Mandenka 0.000 20South Africa (Bantu) 0.000 101Dinka 0.000 40Uganda 0.000 7Mbuti (Pygmy) 0.000 17Zimbabwe 0.000 47

a: When no individuals carry the lineage the number of individuals sampled is indicated in boldb: The absence of the deep lineage was inferred for individuals carrying the derived state at 111829579c Standard deviation in allele frequency estimated using method described in supplementary material.

Page 20: SUPPLEMETARY MATERIAL - University of Arizonaflmendez/papers/Mendez_2012a_suppl.pdfSUPPLEMETARY MATERIAL Supplementary Methods DNA sequencing DNA samples were PCR amplified and Sanger

Table S5b. Frequency of the shallow lineagePopulation frequency standard deviation/sample size a

Oceania Australia (West) 0.296 0.093Melanesian (Nasioi) 0.237 0.061Vanuatu 0.211 0.047New Britain 0.077 0.028Micronesia 0.105 0.044Papuan 0.383 0.041Tonga 0.045 0.045Samoas 0.034 0.034Tahiti 0.097 0.040

South East Asia Eastern Indonesia 0.167 0.112Western Indonesia 0.000 7Malay 0.000 4Philippines 0.000 12Vietnam 0.000 6

East Asia Han 0.000 20Yao 0.000 8Taiwan 0.000 11

South Asia India 0.000 159Pakistan 0.000 68Sri Lanka 0.000 64

Europe Basque 0.000 20

North East Africa Ethiopia 0.000 11

Sub-Saharan Africa Baka (Pygmy) 0.000 14Biaka (Pygmy) 0.019 0.019Gambia 0.000 44Ghana 0.023 0.013Ivory Coast 0.000 23Kenya (Bantu) 0.010 0.010KhoiSan 0.147 0.043Yoruba 0.011 0.006Mandenka 0.000 18South Africa (Bantu) 0.028 0.014Dinka 0.000 40Uganda 0.000 6Mbuti (Pygmy) 0.107 0.034Zimbabwe 0.042 0.021

a When no individuals carry the lineage the number of individuals sampled is indicated.

Page 21: SUPPLEMETARY MATERIAL - University of Arizonaflmendez/papers/Mendez_2012a_suppl.pdfSUPPLEMETARY MATERIAL Supplementary Methods DNA sequencing DNA samples were PCR amplified and Sanger

Table S6. Polymorphism information relevant for comparison of the reference sequence, deep, and shallow lineages in BED format.

descriptor chromosome beginning end comment a

track name=Human_der description="Sites derived in the ancestry of all humans" color=128,128,0

chr12 111820069 111820070 20070,A->G,COVERED

chr12 111820113 111820114 20114,T->G,COVERED

chr12 111820514 111820515 20515,C->T,COVERED

chr12 111820591 111820592 20592,C->T,COVERED

chr12 111821056 111821057 21057,A->G,COVERED

chr12 111821063 111821064 21064,G->A,COVERED

chr12 111821069 111821070 21070,T->C,COVERED

chr12 111821219 111821220 21220,C->A,NONCOVERED

chr12 111821361 111821362 21362,A->C,COVERED

chr12 111821412 111821413 21413,A->C,COVERED

chr12 111821843 111821844 21844,T->C,COVERED

chr12 111822135 111822136 22136,T->G,COVERED

chr12 111822356 111822357 22357,C->T,NONCOVERED

chr12 111822385 111822386 22386,G->A,NONCOVERED

chr12 111822410 111822411 22411,C->G,NONCOVERED

chr12 111822685 111822686 22686,G->T,COVERED

chr12 111822914 111822915 22915,T->A,COVERED

chr12 111822935 111822936 22936,T->C,COVERED

chr12 111823001 111823002 23002,C->T,COVERED

chr12 111823084 111823085 23085,A->G,COVERED

chr12 111823633 111823634 23634,C->T,COVERED

chr12 111823791 111823792 23792,C->T,COVERED

chr12 111823901 111823902 23902,C->G,NONCOVERED

chr12 111823922 111823923 23923,T->G,COVERED

chr12 111824003 111824004 24004,A->G,COVERED

chr12 111824136 111824137 24137,G->A,NONCOVERED

chr12 111824226 111824227 24227,C->T,NONCOVERED

chr12 111825034 111825035 25035,T->C,NONCOVERED

chr12 111825373 111825374 25374,T->A,COVERED

chr12 111825450 111825451 25451,A->G,COVERED

chr12 111825520 111825521 25521,C->G,NONCOVERED

chr12 111825899 111825900 25900,A->G,COVERED

chr12 111826139 111826140 26140,A->G,NONCOVERED

chr12 111826394 111826395 26395,A->T,COVERED

chr12 111826432 111826433 26433,A->G,COVERED

chr12 111826612 111826613 26613,C->T,COVERED

chr12 111826654 111826655 26655,T->C,COVERED

chr12 111826687 111826688 26688,A->G,COVERED

chr12 111826759 111826760 26760,G->A,COVERED

chr12 111827083 111827084 27084,C->T,COVERED

chr12 111827738 111827739 27739,G->A,COVERED

chr12 111827951 111827952 27952,G->C,COVERED

chr12 111829768 111829769 29769,C->T,COVERED

chr12 111829812 111829813 29813,G->A,COVERED

chr12 111830244 111830245 30245,C->T,COVERED

chr12 111831916 111831917 31917,G->A,NONCOVERED

chr12 111832010 111832011 32011,A->G,COVERED

chr12 111832700 111832701 32701,T->C,COVERED

chr12 111833489 111833490 33490,C->G,COVERED

chr12 111833734 111833735 33735,C->T,NONCOVERED

chr12 111834147 111834148 34148,C->T,NONCOVERED

Page 22: SUPPLEMETARY MATERIAL - University of Arizonaflmendez/papers/Mendez_2012a_suppl.pdfSUPPLEMETARY MATERIAL Supplementary Methods DNA sequencing DNA samples were PCR amplified and Sanger

Table S6. Polymorphism information relevant for comparison of the reference sequence, deep, and shallow lineages in BED format.

descriptor chromosome beginning end comment a

chr12 111834492 111834493 34493,A->C,NONCOVERED

chr12 111834736 111834737 34737,C->T,COVERED

chr12 111835438 111835439 35439,A->G,NONCOVERED

chr12 111836290 111836291 36291,C->G,COVERED

chr12 111837077 111837078 37078,C->A,NONCOVERED

chr12 111837410 111837411 37411,G->A,NONCOVERED

chr12 111837733 111837734 37734,G->A,COVERED

chr12 111838279 111838280 38280,A->G,COVERED

chr12 111838292 111838293 38293,A->C,COVERED

chr12 111838334 111838335 38335,A->C,COVERED

chr12 111838766 111838767 38767,A->G,COVERED

chr12 111839317 111839318 39318,T->G,NONCOVERED

chr12 111839369 111839370 39370,C->T,NONCOVERED

chr12 111840145 111840146 40146,A->G,NONCOVERED

chr12 111840204 111840205 40205,T->G,NONCOVERED

chr12 111840236 111840237 40237,A->T,NONCOVERED

chr12 111840398 111840399 40399,T->G,COVERED

chr12 111840578 111840579 40579,T->C,NONCOVERED

chr12 111840622 111840623 40623,A->C,NONCOVERED

chr12 111840675 111840676 40676,T->A,NONCOVERED

chr12 111841218 111841219 41219,A->G,COVERED

chr12 111841452 111841453 41453,T->C,COVERED

chr12 111841585 111841586 41586,G->A,COVERED

chr12 111842045 111842046 42046,T->A,NONCOVERED

chr12 111842070 111842071 42071,G->A,NONCOVERED

track name=Deep_vs_ref description="Differences between the deep and reference sequences" color=128,0,128

chr12 111822138 111822139 22139,T->G

chr12 111822326 111822327 22327,G->A

chr12 111822608 111822611 22609,CAC_del

chr12 111825980 111825981 25981,C->T

chr12 111826027 111826028 26028,T->A

chr12 111826030 111826048 26031,large_assymetric_substitution

chr12 111826366 111826367 26367,C->A

chr12 111826980 111826981 26981,G->T

chr12 111827757 111827758 27758,G->C

chr12 111827945 111827946 27946,G->A

chr12 111828528 111828529 28529,G->C

chr12 111828554 111828555 28555,G->A

chr12 111829012 111829013 29013,T->C

chr12 111829491 111829491 29491,GAGAGA_in-del

chr12 111829549 111829550 29550,C->A

chr12 111829578 111829579 29579,C->T

chr12 111829675 111829676 29676,G->A

chr12 111829697 111829698 29698,G->C

chr12 111829735 111829736 29736,A->C

chr12 111830162 111830163 30163,T->A

chr12 111830353 111830354 30354,C->T

chr12 111830388 111830389 30389,C->T

chr12 111830503 111830504 30504,T->C

chr12 111830852 111830853 30853,A->G

chr12 111830928 111830929 30929,C->G

Page 23: SUPPLEMETARY MATERIAL - University of Arizonaflmendez/papers/Mendez_2012a_suppl.pdfSUPPLEMETARY MATERIAL Supplementary Methods DNA sequencing DNA samples were PCR amplified and Sanger

Table S6. Polymorphism information relevant for comparison of the reference sequence, deep, and shallow lineages in BED format.

descriptor chromosome beginning end comment a

chr12 111831399 111831400 31400,C->T

chr12 111831806 111831807 31807,T->G

chr12 111831999 111832000 32000,G->A

chr12 111832262 111832264 32263,deletion

chr12 111833009 111833010 33010,G->A

chr12 111833303 111833304 33304,T->G

chr12 111833317 111833318 33318,G->C

chr12 111835178 111835179 35179,T->G

chr12 111836439 111836440 36440,G->A

chr12 111836789 111836790 36790,C->T

chr12 111837033 111837034 37034,C->T,N

chr12 111838812 111838813 38813,G->A

chr12 111839657 111839658 39658,G->T

chr12 111840236 111840237 40237,uncertain_ancestral_state

chr12 111840704 111840705 40705,T->G

chr12 111841304 111841305 41305,C->T

chr12 111841362 111841363 41363,C->T

chr12 111841591 111841592 41592,G->A

chr12 111841619 111841620 41620,G->C

chr12 111841649 111841651 41650,AT_deletion_frameshit

chr12 111841824 111841825 41825,G->A

track name=Deep_anc description="Sites ancestral in the deep lineage and exclusive to Papuans" color=255,0,255

chr12 111828554 111828555 28555,G->A,shared_with_Afr,D

chr12 111829012 111829013 29013,T->C,D

chr12 111829491 111829491 29491,in-del_GAGAGA,proposed_gene_conv,D

chr12 111829578 111829579 29579,proposed_gene_conv,D

chr12 111829735 111829736 29736,A->C,D

chr12 111830353 111830354 30354,C->T,D

chr12 111831399 111831400 31400,C->T,D

chr12 111831999 111832000 32000,G->A,D

chr12 111833303 111833304 33304,T->G,D

chr12 111841304 111841305 41305,C->T,D

chr12 111841362 111841363 41363,C->T,D

track name=Deep_der description="Sites derived in deep lineage" color=0,0,255

chr12 111822138 111822139 22139,T->G,N

chr12 111822326 111822327 22327,G->A,N

chr12 111822608 111822611 22609,CAC_del,N

chr12 111826027 111826028 26028,T->A,NONCOVERED

chr12 111826030 111826048 26031,large_assymetric_substitution,NONCOVERED

chr12 111826366 111826367 26367,C->A,N

chr12 111829675 111829676 29676,G->A,D

chr12 111829697 111829698 29698,G->C,D

chr12 111830162 111830163 30163,T->A,D

chr12 111830388 111830389 30389,C->T,D

chr12 111830503 111830504 30504,T->C,N

chr12 111830852 111830853 30853,A->G,D

chr12 111830928 111830929 30929,C->G,D

chr12 111831806 111831807 31807,T->G,D

chr12 111832262 111832264 32263,deletion,D

chr12 111833009 111833010 33010,G->A,NONCOVERED

Page 24: SUPPLEMETARY MATERIAL - University of Arizonaflmendez/papers/Mendez_2012a_suppl.pdfSUPPLEMETARY MATERIAL Supplementary Methods DNA sequencing DNA samples were PCR amplified and Sanger

Table S6. Polymorphism information relevant for comparison of the reference sequence, deep, and shallow lineages in BED format.

descriptor chromosome beginning end comment a

chr12 111833317 111833318 33318,G->C,D

chr12 111836439 111836440 36440,G->A,N

chr12 111836789 111836790 36790,C->T,D

chr12 111837033 111837034 37034,C->T,N

chr12 111838812 111838813 38813,G->A,D

chr12 111839657 111839658 39658,G->T,D

chr12 111840236 111840237 40237,A<->T,D

chr12 111840704 111840705 40705,T->G,NONCOVERED

chr12 111841649 111841651 41650,AT_deletion_frameshit,N

track name=Shallow_der description="Polymorphisms derived in the shallow lineage (Papuan or African)" color=255,255,0

chr12 111826924 111826925 26925,A->T,AFRICAN

chr12 111829029 111829030 29030,A->G,AFRICAN

chr12 111836923 111836924 36924,T->G,BOTH

chr12 111836923 111836933 36933,homopolymer,BOTH

chr12 111837940 111837941 37941,A->T,AFRICAN

chr12 111838513 111838514 38514,C->T,PAPUAN

chr12 111838812 111838813 38813,G->A,D,BOTH

chr12 111838935 111838936 38936,C->T,BOTH

chr12 111839752 111839753 39753,G->A,BOTH

chr12 111840607 111840607 40607,C->A,PAPUAN

chr12 111841598 111841599 41599,A->G,AFRICAN

chr12 111841791 111841792 41792,C->T,AFRICAN

chr12 111841947 111841948 41948,C->T,AFRICAN

track name=Ref_anc description="Sites ancestral in the Reference sequence and derived in all Papuans" color=255,0,0

chr12 111841591 111841592 41592,G->A

chr12 111841619 111841620 41620,G->C

chr12 111841824 111841825 41825,G->A

track name=Ref_der description="Polymorphic sites derived in the Reference, but ancestral in Africans and Papuans of this study" color=255,0,0

chr12 111825980 111825981 25981,C->T

chr12 111826980 111826981 26981,G->T

chr12 111827757 111827758 27758,G->C

chr12 111827945 111827946 27946,G->A

chr12 111828528 111828529 28529,G->C

chr12 111829549 111829550 29550,C->A

chr12 111835178 111835179 35179,T->Ga: Polymorphisms with no coverage in Denisova are indicated as 'NONCOVERED'. D indicates sharing and N indicates non-sharing between the deep lineage and Denisova.

Page 25: SUPPLEMETARY MATERIAL - University of Arizonaflmendez/papers/Mendez_2012a_suppl.pdfSUPPLEMETARY MATERIAL Supplementary Methods DNA sequencing DNA samples were PCR amplified and Sanger

Table S7a. Parameters for the demographic modelsa

model N0 t0 N1 t1 N2 Ma 250 200 250 100 250 2000a 500 200 500 100 500 2000a 1000 200 1000 100 1000 2000a 2000 200 2000 100 2000 2000a 4000 200 4000 100 4000 2000

b 250 0 50 1800 50 2000b 250 0 100 1800 100 2000b 250 0 200 1800 200 2000b 500 0 50 1800 50 2000b 500 0 100 1800 100 2000b 500 0 200 1800 200 2000b 1000 0 50 1800 50 2000b 1000 0 100 1800 100 2000b 1000 0 200 1800 200 2000b 2000 0 50 1800 50 2000b 2000 0 100 1800 100 2000b 2000 0 200 1800 200 2000b 4000 0 50 1800 50 2000b 4000 0 100 1800 100 2000b 4000 0 200 1800 200 2000

c 250 0 50 1800 1000 2000c 250 0 100 1800 1000 2000c 250 0 200 1800 1000 2000c 500 0 50 1800 1000 2000c 500 0 100 1800 1000 2000c 500 0 200 1800 1000 2000c 1000 0 50 1800 1000 2000c 1000 0 100 1800 1000 2000c 1000 0 200 1800 1000 2000c 2000 0 50 1800 1000 2000c 2000 0 100 1800 1000 2000c 2000 0 200 1800 1000 2000c 4000 0 50 1800 1000 2000c 4000 0 100 1800 1000 2000c 4000 0 200 1800 1000 2000

d 250 0 1 0 1 2000d 500 0 1 0 1 2000d 1000 0 1 0 1 2000d 2000 0 1 0 1 2000d 4000 0 1 0 1 2000

e 250 200 50 0 50 2000e 250 200 100 0 100 2000e 250 200 200 0 200 2000e 500 200 50 0 50 2000e 500 200 100 0 100 2000e 500 200 200 0 200 2000e 1000 200 50 0 50 2000e 1000 200 100 0 100 2000e 1000 200 200 0 200 2000

Page 26: SUPPLEMETARY MATERIAL - University of Arizonaflmendez/papers/Mendez_2012a_suppl.pdfSUPPLEMETARY MATERIAL Supplementary Methods DNA sequencing DNA samples were PCR amplified and Sanger

Table S7a. Parameters for the demographic modelsa

model N0 t0 N1 t1 N2 Me 2000 200 50 0 50 2000e 2000 200 100 0 100 2000e 2000 200 200 0 200 2000e 4000 200 50 0 50 2000e 4000 200 100 0 100 2000e 4000 200 200 0 200 2000e 250 600 50 0 50 2000e 250 600 100 0 100 2000e 250 600 200 0 200 2000e 500 600 50 0 50 2000e 500 600 100 0 100 2000e 500 600 200 0 200 2000e 1000 600 50 0 50 2000e 1000 600 100 0 100 2000e 1000 600 200 0 200 2000e 2000 600 50 0 50 2000e 2000 600 100 0 100 2000e 2000 600 200 0 200 2000e 4000 600 50 0 50 2000e 4000 600 100 0 100 2000e 4000 600 200 0 200 2000e 250 1000 50 0 50 2000e 250 1000 100 0 100 2000e 250 1000 200 0 200 2000e 500 1000 50 0 50 2000e 500 1000 100 0 100 2000e 500 1000 200 0 200 2000e 1000 1000 50 0 50 2000e 1000 1000 100 0 100 2000e 1000 1000 200 0 200 2000e 2000 1000 50 0 50 2000e 2000 1000 100 0 100 2000e 2000 1000 200 0 200 2000e 4000 1000 50 0 50 2000e 4000 1000 100 0 100 2000e 4000 1000 200 0 200 2000e 250 1400 50 0 50 2000e 250 1400 100 0 100 2000e 250 1400 200 0 200 2000e 500 1400 50 0 50 2000e 500 1400 100 0 100 2000e 500 1400 200 0 200 2000e 1000 1400 50 0 50 2000e 1000 1400 100 0 100 2000e 1000 1400 200 0 200 2000e 2000 1400 50 0 50 2000e 2000 1400 100 0 100 2000e 2000 1400 200 0 200 2000e 4000 1400 50 0 50 2000e 4000 1400 100 0 100 2000e 4000 1400 200 0 200 2000a: parameters as in Figure S5

Page 27: SUPPLEMETARY MATERIAL - University of Arizonaflmendez/papers/Mendez_2012a_suppl.pdfSUPPLEMETARY MATERIAL Supplementary Methods DNA sequencing DNA samples were PCR amplified and Sanger

Table S7b. Estimates of TMRCA for 10 chromosomes in the deep and shallow lineagesa,b

lineage model median 0.025 quantile 0.975 quantiledeep a 28684 7991 118588

b 18905 7632 43314c 18915 7703 42497d 19055 7200 33379e 32714 8605 68557all 23767 8055 65659

shallow a 21962 6330 91013b 17407 6600 38249c 17433 6609 38247d 16338 5932 31983e 24823 6720 63367all 19287 6571 58326

a: times are expressed in yearsb: model parameters as in Table S7a and Figure S5

Page 28: SUPPLEMETARY MATERIAL - University of Arizonaflmendez/papers/Mendez_2012a_suppl.pdfSUPPLEMETARY MATERIAL Supplementary Methods DNA sequencing DNA samples were PCR amplified and Sanger

Table S8. Summary statistics for the five amplicons of OAS1

first amplicon second amplicon third amplicon

Population na S!w (%) "w (%) D private

Private non-

singletons S!w (%) "w (%) D private

Private non-

singletons

Biaka 30 3 0.066 0.068 0.08 1 0 2 0.045 0.055 0.45 0 0

Mandenka 32 3 0.065 0.071 0.21 1 1 2 0.044 0.051 0.31 0 0

San 20 3 0.074 0.070 -0.15 1 0 1 0.025 0.017 -0.62 0 0

PNG 30 6 0.132 0.228 2.08 5 5 6 0.135 0.174 0.84 5 4

Han 32 0 0.000 0.000 N.D.b 0 0 1 0.022 0.028 0.40 0 0

French Basque 32 2 0.043 0.011 -1.52 1 0 1 0.022 0.006 -1.16 0 0a number of chromosomes in the sampleb not defined

Page 29: SUPPLEMETARY MATERIAL - University of Arizonaflmendez/papers/Mendez_2012a_suppl.pdfSUPPLEMETARY MATERIAL Supplementary Methods DNA sequencing DNA samples were PCR amplified and Sanger

Table S8. Summary statistics for the five amplicons of OAS1

third amplicon fourth amplicon fifth amplicon

S!w (%) "w (%) D private

Private non-

singletons S!w (%) "w (%) D private

Private non-

singletons S!w (%) "w (%) D private

Private non-

singletons

5 0.088 0.117 0.90 0 0 1 0.012 0.006 -0.78 0 0 5 0.102 0.062 -1.08 1 0

8 0.139 0.133 -0.12 2 1 5 0.061 0.015 -2.03 4 0 13 0.260 0.097 -2.05 0 0

4 0.079 0.077 -0.07 1 0 6 0.083 0.064 -0.75 1 1 16 0.366 0.288 -0.80 2 1

6 0.106 0.127 0.59 4 3 7 0.086 0.142 1.92 3 3 5 0.102 0.130 0.76 3 0

5 0.087 0.064 -0.70 3 0 1 0.012 0.003 -1.16 1 0 6 0.120 0.143 0.54 2 1

3 0.052 0.077 1.12 0 0 1 0.012 0.003 -1.16 1 0 4 0.080 0.101 0.65 0 0

Page 30: SUPPLEMETARY MATERIAL - University of Arizonaflmendez/papers/Mendez_2012a_suppl.pdfSUPPLEMETARY MATERIAL Supplementary Methods DNA sequencing DNA samples were PCR amplified and Sanger

Table S9. Genotypesa for SNPs at positions 111828579, 111831807 and 111839573 (hg18) in samples of the HGDPPopulation Sample 111829579 111831807b 111839753b

Nasioi HGDP00490 1 0 0Nasioi HGDP00491 2 0 0Nasioi HGDP00655 2 0 1Nasioi HGDP00656 1 1 0Nasioi HGDP00657 2 0 0Nasioi HGDP00658 1 0 1Nasioi HGDP00660 0 1 1Nasioi HGDP00661 1 0 0Nasioi HGDP00662 2 0 1Nasioi HGDP00663 0 0 1Nasioi HGDP00664 1 0 1Nasioi HGDP00787 1 0 0Nasioi HGDP00788 0 0 0Nasioi HGDP00789 0 0 1Nasioi HGDP00823 1 0 1Nasioi HGDP00824 0 1 0Nasioi HGDP00825 2 0 0Nasioi HGDP00978 1 0 1Nasioi HGDP01027 1 0 1

PNG HGDP01081 2 0 0PNG HGDP00540 2 0 1PNG HGDP00541 1 0 1PNG HGDP00542 2 0 0PNG HGDP00543 2 0 1PNG HGDP00544 2 0 0PNG HGDP00545 1 0 2PNG HGDP00546 1 0 1PNG HGDP00547 0 1 1PNG HGDP00548 1 0 1PNG HGDP00549 1 1 0PNG HGDP00550 0 0 2PNG HGDP00551 1 0 1PNG HGDP00552 1 1 0PNG HGDP00553 1 1 1PNG HGDP00554 1 1 1PNG HGDP00555 0 2 0PNG HGDP00556 0 0 2

Biaka HGDP00470 2 0Biaka HGDP00461 2Biaka HGDP00464 2Biaka HGDP00466 2Biaka HGDP00477 2Biaka HGDP00451 2 0Biaka HGDP00452 2 0Biaka HGDP00454 2 0Biaka HGDP00455 2 0Biaka HGDP00457 2 0Biaka HGDP00458 2 0

Page 31: SUPPLEMETARY MATERIAL - University of Arizonaflmendez/papers/Mendez_2012a_suppl.pdfSUPPLEMETARY MATERIAL Supplementary Methods DNA sequencing DNA samples were PCR amplified and Sanger

Table S9. Genotypesa for SNPs at positions 111828579, 111831807 and 111839573 (hg18) in samples of the HGDPPopulation Sample 111829579 111831807b 111839753b

Biaka HGDP00459 2 0Biaka HGDP00460 2 0Biaka HGDP00479 2 0Biaka HGDP00981 2 0Biaka HGDP00985 2 0Biaka HGDP01088 2 0Biaka HGDP01089 2 0Biaka HGDP01091 2 0Biaka HGDP01094 2 0

Mandenka HGDP00904 2 0Mandenka HGDP00905 2 0Mandenka HGDP00906 2 0Mandenka HGDP00907 2 0Mandenka HGDP00908 2 0Mandenka HGDP00909 2 0 0Mandenka HGDP00910 2 0 0Mandenka HGDP00911 2 0Mandenka HGDP00912 2 0Mandenka HGDP00913 2 0Mandenka HGDP00914 2 0Mandenka HGDP00915 2 0Mandenka HGDP01199 2 0Mandenka HGDP01200 2 0Mandenka HGDP01202 2 0Mandenka HGDP01283 2 0Mandenka HGDP01284 2 0Mandenka HGDP01286 2 0Mandenka HGDP00919 2Mandenka HGDP01285 2

San HGDP00992 2 0 1San HGDP00987 2 0 0San HGDP00988 2 0 0San HGDP00991 2 0 0San HGDP01029 2 0 0San HGDP01032 2 0 0San HGDP01036 2 0 0

Mbuti HGDP00476 2 2Mbuti HGDP00449 2 1Mbuti HGDP00456 2 0Mbuti HGDP00462 2 0Mbuti HGDP00463 2 0Mbuti HGDP00471 2 0Mbuti HGDP00474 2 0Mbuti HGDP00478 2 0Mbuti HGDP00982 2 0Mbuti HGDP00984 2 0

French Basque HGDP01357 2 0

Page 32: SUPPLEMETARY MATERIAL - University of Arizonaflmendez/papers/Mendez_2012a_suppl.pdfSUPPLEMETARY MATERIAL Supplementary Methods DNA sequencing DNA samples were PCR amplified and Sanger

Table S9. Genotypesa for SNPs at positions 111828579, 111831807 and 111839573Population Sample 111829579 111831807b 111839753b

French Basque HGDP01358 2 0French Basque HGDP01359 2 0French Basque HGDP01360 2 0French Basque HGDP01361 2 0French Basque HGDP01362 2 0French Basque HGDP01363 2 0 0French Basque HGDP01364 2 0French Basque HGDP01365 2 0 0French Basque HGDP01367 2 0 0French Basque HGDP01369 2 0 0French Basque HGDP01370 2 0French Basque HGDP01371 2 0French Basque HGDP01372 2 0French Basque HGDP01374 2 0French Basque HGDP01375 2 0French Basque HGDP01376 2 0French Basque HGDP01377 2 0French Basque HGDP01378 2 0French Basque HGDP01379 2 0

Han HGDP00775 2 0Han HGDP00777 2 0Han HGDP00778 2 0Han HGDP00815 2 0Han HGDP01288 2 0Han HGDP01289 2 0 0Han HGDP01290 2 0Han HGDP01292 2 0 0Han HGDP01293 2 0Han HGDP01294 2 0Han HGDP01295 2 0Han HGDP01296 2 0Han HGDP00774 2 0Han HGDP00780 2 0Han HGDP00785 2 0Han HGDP00786 2 0Han HGDP00819 2 0Han HGDP00821 2 0 0Han HGDP00822 2 0 0Han HGDP00977 2 0a Homozygote ancestral are represented with 0, heterozygotes with 1 and homozygote derived with 2b Unknown genotypes were left blank

Page 33: SUPPLEMETARY MATERIAL - University of Arizonaflmendez/papers/Mendez_2012a_suppl.pdfSUPPLEMETARY MATERIAL Supplementary Methods DNA sequencing DNA samples were PCR amplified and Sanger

Supplementary Figures

Figure S1. Levels of genetic variation in Papuans at the set of the first three amplicons and 61

intergenic loci. Values of per base nucleotide diversity a) uncorrected and b) corrected by

divergence with chimpanzee. Ratio between values of nucleotide diversity in Papuans and in the

other sequenced populations: c) Biaka, d) Mandenka, e) San, f) Han, g) Basque.

Figure S2. Median -joining network of phased haplotypes for the fourth amplicons and the first

620 bases of the fifth amplicon. Branch lengths are proportional to the number of polymorphisms

distinguishing the haplotypes and the area of the circles is proportional to the observed

occurrence of the haplotypes.

Figure S3. Geographic distribution of the shallow lineage in a) Melanesian and b) Africa. The

fraction of chromosomes carrying the allele is indicated in black.

Figure S4. Contour plot of P-values for the maintenance of linkage disequilibrium for the 6 kb in

28001 to 34000 as a function of time since recombination events are detectable and

recombination rate. The level lines indicate the P-values. The larger dot corresponds to the point

estimate of divergence time and recombination rate (3.3 Mya and 0.0168, respectively), and the

smaller dot to conservative estimates (1.7 Mya and 0.0084, respectively) for those parameters.

Figure S5. Models for the size of the population of chromosomes of either the deep or shallow

lineages. In each case, the width represents the number of chromosomes in the lineage and time

is represented in the vertical direction. The bottom corresponds to the present and the top to the

Page 34: SUPPLEMETARY MATERIAL - University of Arizonaflmendez/papers/Mendez_2012a_suppl.pdfSUPPLEMETARY MATERIAL Supplementary Methods DNA sequencing DNA samples were PCR amplified and Sanger

past. The models include: a) constant population size, b) exponential growth from constant

population size, c) population crash followed by exponential growth, d) exponential growth from

a single chromosome, and e) initially as in b), but the population of chromosomes has reached a

stable size. All the models are nested in f), which contains all the parameters used table S7a.

Page 35: SUPPLEMETARY MATERIAL - University of Arizonaflmendez/papers/Mendez_2012a_suppl.pdfSUPPLEMETARY MATERIAL Supplementary Methods DNA sequencing DNA samples were PCR amplified and Sanger

Figure S1

Page 36: SUPPLEMETARY MATERIAL - University of Arizonaflmendez/papers/Mendez_2012a_suppl.pdfSUPPLEMETARY MATERIAL Supplementary Methods DNA sequencing DNA samples were PCR amplified and Sanger

Figure S2

Page 37: SUPPLEMETARY MATERIAL - University of Arizonaflmendez/papers/Mendez_2012a_suppl.pdfSUPPLEMETARY MATERIAL Supplementary Methods DNA sequencing DNA samples were PCR amplified and Sanger

Figure S3

Page 38: SUPPLEMETARY MATERIAL - University of Arizonaflmendez/papers/Mendez_2012a_suppl.pdfSUPPLEMETARY MATERIAL Supplementary Methods DNA sequencing DNA samples were PCR amplified and Sanger

Figure S4

Page 39: SUPPLEMETARY MATERIAL - University of Arizonaflmendez/papers/Mendez_2012a_suppl.pdfSUPPLEMETARY MATERIAL Supplementary Methods DNA sequencing DNA samples were PCR amplified and Sanger

Figure S5