single-cell whole-genome amplification and sequencing...

26
Single-Cell Whole-Genome Amplification and Sequencing: Methodology and Applications Lei Huang, 1 Fei Ma, 1 Alec Chapman, 2, 3 Sijia Lu, 4 and Xiaoliang Sunney Xie 1, 2 1 Biodynamic Optical Imaging Center (BIOPIC), School of Life Sciences, Peking University, Beijing 100871, China 2 Department of Chemistry and Chemical Biology and 3 Graduate Program in Biophysics, Harvard University, Cambridge, Massachusetts 01238; email: [email protected] 4 Yikon Genomics Co. Ltd., Taizhou, Jiangsu 225300, China Annu. Rev. Genomics Hum. Genet. 2015. 16:79–102 First published online as a Review in Advance on June 11, 2015 The Annual Review of Genomics and Human Genetics is online at genom.annualreviews.org This article’s doi: 10.1146/annurev-genom-090413-025352 Copyright c 2015 by Annual Reviews. All rights reserved Keywords single-cell genomics, cancer genomics, WGA, DOP-PCR, MDA, MALBAC, CTC, PGD, PGS, in vitro fertilization Abstract We present a survey of single-cell whole-genome amplification (WGA) methods, including degenerate oligonucleotide–primed polymerase chain reaction (DOP-PCR), multiple displacement amplification (MDA), and multiple annealing and looping–based amplification cycles (MALBAC). The key parameters to characterize the performance of these methods are de- fined, including genome coverage, uniformity, reproducibility, unmappable rates, chimera rates, allele dropout rates, false positive rates for calling single- nucleotide variations, and ability to call copy-number variations. Using these parameters, we compare five commercial WGA kits by performing deep se- quencing of multiple single cells. We also discuss several major applications of single-cell genomics, including studies of whole-genome de novo mu- tation rates, the early evolution of cancer genomes, circulating tumor cells (CTCs), meiotic recombination of germ cells, preimplantation genetic di- agnosis (PGD), and preimplantation genomic screening (PGS) for in vitro– fertilized embryos. 79 Annu. Rev. Genom. Human Genet. 2015.16:79-102. Downloaded from www.annualreviews.org Access provided by Harvard University on 09/01/15. For personal use only.

Upload: others

Post on 05-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Single-Cell Whole-Genome Amplification and Sequencing ...harvard.sunneyxielab.org/papers/2015_Huang_Annu... · change. The single-cell methylome has been determined (19), but this

GG16CH04-Xie ARI 21 July 2015 11:16

Single-Cell Whole-GenomeAmplification and Sequencing:Methodology and ApplicationsLei Huang,1 Fei Ma,1 Alec Chapman,2,3 Sijia Lu,4

and Xiaoliang Sunney Xie1,2

1Biodynamic Optical Imaging Center (BIOPIC), School of Life Sciences, Peking University,Beijing 100871, China2Department of Chemistry and Chemical Biology and 3Graduate Program in Biophysics,Harvard University, Cambridge, Massachusetts 01238; email: [email protected] Genomics Co. Ltd., Taizhou, Jiangsu 225300, China

Annu. Rev. Genomics Hum. Genet. 2015.16:79–102

First published online as a Review in Advance onJune 11, 2015

The Annual Review of Genomics and Human Geneticsis online at genom.annualreviews.org

This article’s doi:10.1146/annurev-genom-090413-025352

Copyright c© 2015 by Annual Reviews.All rights reserved

Keywords

single-cell genomics, cancer genomics, WGA, DOP-PCR, MDA,MALBAC, CTC, PGD, PGS, in vitro fertilization

Abstract

We present a survey of single-cell whole-genome amplification (WGA)methods, including degenerate oligonucleotide–primed polymerase chainreaction (DOP-PCR), multiple displacement amplification (MDA), andmultiple annealing and looping–based amplification cycles (MALBAC). Thekey parameters to characterize the performance of these methods are de-fined, including genome coverage, uniformity, reproducibility, unmappablerates, chimera rates, allele dropout rates, false positive rates for calling single-nucleotide variations, and ability to call copy-number variations. Using theseparameters, we compare five commercial WGA kits by performing deep se-quencing of multiple single cells. We also discuss several major applicationsof single-cell genomics, including studies of whole-genome de novo mu-tation rates, the early evolution of cancer genomes, circulating tumor cells(CTCs), meiotic recombination of germ cells, preimplantation genetic di-agnosis (PGD), and preimplantation genomic screening (PGS) for in vitro–fertilized embryos.

79

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 201

5.16

:79-

102.

Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

g A

cces

s pr

ovid

ed b

y H

arva

rd U

nive

rsity

on

09/0

1/15

. For

per

sona

l use

onl

y.

Page 2: Single-Cell Whole-Genome Amplification and Sequencing ...harvard.sunneyxielab.org/papers/2015_Huang_Annu... · change. The single-cell methylome has been determined (19), but this

GG16CH04-Xie ARI 21 July 2015 11:16

INTRODUCTION

Individual cells are the fundamental units of life. DNA that carries genetic information exists assingle molecules in individual cells. In biology and medicine, there is often a need to characterizegenomes of individual cells for several reasons: (a) Some cells are precious and exist in low numbers[for example, human oocytes and circulating tumor cells (CTCs)]; (b) every cell is unique in itsgenome (for example, every sperm cell of an individual is different because of recombination);(c) the genomes of individual cells undergo stochastic changes with time, and hence single cells’genomes at particular times can reveal their temporal evolution; and (d ) the genomes of individualcells in the same sample are heterogeneous (as in the case of primary cancer tissues), and conse-quently one often needs to determine the distribution rather than the average of a large ensemble ofcells.

Changes in the genome of a single cell include single-nucleotide variations (SNVs) and struc-tural variations, which result in copy-number variations (CNVs). SNVs are single-base insertions,deletions, or mutations, either transitions (e.g., C→T, A→G) or transversions (e.g., A→T, C→G).Structural variations are genomic rearrangements, including insertions or deletions (indels), du-plications, inversions, and translocations. Structural variations at large genome scales are linkedto CNVs. Because of technical difficulties (chimeras; see below), structural variations are difficultto observe at the single-cell level, whereas CNVs ranging in size from hundreds of kilobases tomegabases can be detected relatively easily.

The copy number of a particular gene in a human somatic cell is normally two because of thetwo alleles (one from each parent). Although it had long been known that gene copy numberscould deviate from two in humans, genome-wide measurements of CNVs (27, 56) promptedextensive investigations of the biology and clinical consequences of CNVs, which often resultfrom double-strand breaks that cannot be repaired perfectly (21).

We note that epigenetics (e.g., the methylation of DNA) represents another type of genomicchange. The single-cell methylome has been determined (19), but this topic is beyond the scopeof our review.

Advances in next-generation DNA sequencing technologies have enabled individual humangenomes to be sequenced at affordable costs (4, 43, 69, 72). With the development of single-cellwhole-genome amplification (WGA) techniques—i.e., amplification of all DNA molecules afterthe cell has been lysed—single-cell genomics has emerged as an exciting field of its own. In thisarticle, we review the principles and compare the performance of the existing methodologies forsingle-cell WGA. We note that, in parallel with single-cell genomics, single-cell transcriptomicshas seen rapid developments as well; these techniques are beyond the scope of this article, andwe refer readers to other recent reviews for more information (55, 57, 60). We provide examplesto illustrate the utility of single-cell WGA in biomedical applications, including fundamentalresearch on genome stability and the biology of meiosis, as well as clinical perspectives for cancerdiagnosis and reproductive medicine.

SINGLE-CELL WHOLE-GENOME AMPLIFICATION METHODS

Although sequencing individual DNA molecules with lengths of thousands of bases has becomepossible (20), currently there is not a method that can collect and sequence all DNA fragmentsfrom a single cell. Therefore, sequencing an entire genome of a cell requires WGA. Given the traceamount of DNA from a single human cell (a few picograms), extreme care should be taken to avoidcontamination. Indeed, a major difficulty in single-cell genomics is substantial contamination from

80 Huang et al.

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 201

5.16

:79-

102.

Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

g A

cces

s pr

ovid

ed b

y H

arva

rd U

nive

rsity

on

09/0

1/15

. For

per

sona

l use

onl

y.

Page 3: Single-Cell Whole-Genome Amplification and Sequencing ...harvard.sunneyxielab.org/papers/2015_Huang_Annu... · change. The single-cell methylome has been determined (19), but this

GG16CH04-Xie ARI 21 July 2015 11:16

the environment and operators. Our experience is that WGA should be conducted in a dedicatedclean room with controlled air pressure and quality. As such, the bacterial contamination can becontrolled below 0.1% of the amount of DNA in a human cell (78). Alternatively, microfluidicdevices can be used to minimize contamination (14, 31, 68), which is particularly important forWGA of a bacterial cell.

Regardless of whether the WGA is done in a polymerase chain reaction (PCR) tube or a mi-crofluidic device, the chemistry is the most important aspect. Several different chemistries are avail-able for single-cell WGA, and here we review three major ones: the degenerate oligonucleotide–primed polymerase chain reaction (DOP-PCR) (61), multiple displacement amplification (MDA)(12), and multiple annealing and looping–based amplification cycles (MALBAC) (78).

Degenerate Oligonucleotide–Primed Polymerase Chain Reaction (DOP-PCR)

PCR has had a major impact on biology and medicine in the past 30 years (54). It offers single-copy sensitivity—that is, a single copy of DNA can generate a signal detectable by PCR. Todetect the existence of one mutation in a particular gene, one can design a PCR primer to amplifythe gene locus. Attempts have been made to use PCR to amplify the entire genome using a setof primers. Primer extension preamplification PCR (PEP-PCR) was among the first single-cellWGA methods used (3, 45, 77), followed by the more widely adopted DOP-PCR (61) (Figure 1).

N N N N N N

N N N N N N

N N N N N N

N N N N N N

N N N N N N

N N N N N N

N N N N N N

5'

5'

3'

5'

5'

3'

3'

3'

5'

Primer binding

5'

3' Primer extension

ssDNA template

5'

Primer binding

Primer extension

3'

3' 5'

3'

3' 5'

Figure 1Schematic of the degenerate oligonucleotide–primed polymerase chain reaction (DOP-PCR), which usesdegenerate oligonucleotide primers for whole-genome amplification (45). Additional abbreviation: ssDNA,single-stranded DNA.

www.annualreviews.org • Single-Cell Whole-Genome Amplification and Sequencing 81

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 201

5.16

:79-

102.

Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

g A

cces

s pr

ovid

ed b

y H

arva

rd U

nive

rsity

on

09/0

1/15

. For

per

sona

l use

onl

y.

Page 4: Single-Cell Whole-Genome Amplification and Sequencing ...harvard.sunneyxielab.org/papers/2015_Huang_Annu... · change. The single-cell methylome has been determined (19), but this

GG16CH04-Xie ARI 21 July 2015 11:16

The principle of DOP-PCR is to use degenerate primers containing a random six-base sequenceat the 3′ end and a fixed sequence at the 5′ end. For the initial amplification, the primers bindto the DNA template at a low annealing temperature. Strand extension is then achieved at araised temperature. During the second stage of PCR amplification, the previous products areamplified with a primer targeting the 5′ fixed sequence at a higher annealing temperature. Theconcentrations of the primers and polymerase directly affect the result of DOP-PCR. DOP-PCRhas been used to amplify picogram quantities of human genomic DNA for genotypic analyses (7).

DOP-PCR often yields low genome coverage, which is pertinent to the exponential ampli-fication of PCR. Any small differences in the amplification factors among different sequencesare exponentially enlarged, causing overamplified regions and underamplified regions inthe genome, and hence low coverage. Although lacking completeness in accessing the wholegenome, DOP-PCR can be well suited for measuring CNVs on a large genomic scale with largebin sizes (1 million bases) (47).

Multiple Displacement Amplification (MDA)

MDA was developed in 2001 by Lasken and coworkers (12) using a random hexamer as a primerand φ29 DNA polymerase, a highly processive DNA polymerase (5) with strong strand displace-ment activity. φ29 has a high replication fidelity because of its 3′→5′ exonuclease activity andproofreading activity (17, 51). Under isothermal conditions, MDA extends the random primersand produces branched structures, which are extended by other primers and eventually formmultibranched structures (Figure 2). The DNA fragments are 50–100 kb long.

MDA offers much higher genome coverage than DOP-PCR. However, like DOP-PCR, MDAis an exponential amplification process. This results in sequence-dependent bias, causing over-amplification in certain genomic regions and underamplification in other regions. However, suchsequence-dependent bias of MDA is not reproducible along the genome from cell to cell, render-ing CNV measurements noisy and normalization ineffective. Nevertheless, MDA has been widelyapplied since its invention (37).

Genomic DNA Random primers ϕ29 DNA polymerase

Branched DNA

Figure 2Schematic of multiple displacement amplification (MDA). Random primers are used for isothermalamplification with φ29 DNA polymerase, which has strong strand displacement activity (35).

82 Huang et al.

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 201

5.16

:79-

102.

Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

g A

cces

s pr

ovid

ed b

y H

arva

rd U

nive

rsity

on

09/0

1/15

. For

per

sona

l use

onl

y.

Page 5: Single-Cell Whole-Genome Amplification and Sequencing ...harvard.sunneyxielab.org/papers/2015_Huang_Annu... · change. The single-cell methylome has been determined (19), but this

GG16CH04-Xie ARI 21 July 2015 11:16

Annealing of primers

Synthesis

Extensionat 65°C

Denaturing

Quenchingat 0°C

Meltingat 94°C

m × n

m × n

PCRamplification

MALBAC productto be sequenced

Genomic DNA

Semiamplicon

Full amplicon

m × n2

Loopingat 58°C

(m+1) × n

MALBACprimer

Displaced strand

5'3'

Figure 3Schematic of multiple annealing and looping–based amplification cycles (MALBAC). Random primers with afixed sequence are used in a temperature cycle in which only the original genomic DNA and semiampliconsare linearly amplified, and full amplicons are protected from further amplification by the formation of DNAloops owing to the complementarity of the fixed sequences at the 3′ and 5′ ends. The DNA loops arepolymerase chain reaction (PCR) amplified at the final stage (78). Here, m is the number of temperaturecycles (m = 0 ∼ 10), and n is the number of primers bound; (m + 1) × n is the number of semiampliconspresent at the mth cycle, and m × n2 is the number of full amplicons generated in the mth cycle.

Multiple Annealing and Looping–Based Amplification Cycles (MALBAC)

MALBAC was first reported in 2012 by Zong et al. (78) for single-cell WGA; this method hasthe unique feature of quasi-linear amplification, which reduces the sequence-dependent biasexacerbated by exponential amplification. It has recently been applied to single-cell transcriptomemeasurement as well (6). The key to MALBAC is to not make copies of copies, and instead makecopies only of the original genomic DNA by protecting the amplification products (Figure 3). Thespecially designed MALBAC primers have a common 27-nucleotide sequence at the 5′ end and8 random nucleotides at the 3′ end, which can evenly hybridize to the template when the tem-perature is lowered (to 15–20◦C). At the beginning, semiamplicons with variable lengths (0.5–1.5kb) are made when the temperature is elevated (to 70–75◦C). The semiamplicons are melted offfrom the templates (at 95◦C), from which full amplicons are made with complementary ends,causing the formation of hairpins when the temperature is lowered (to 58◦C), preventing theirfurther amplification. This cycle is repeated 8–12 times. The quasi-linear amplification at thesefirst few cycles is critical for avoiding the sequence-dependent bias exacerbated by exponential

www.annualreviews.org • Single-Cell Whole-Genome Amplification and Sequencing 83

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 201

5.16

:79-

102.

Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

g A

cces

s pr

ovid

ed b

y H

arva

rd U

nive

rsity

on

09/0

1/15

. For

per

sona

l use

onl

y.

Page 6: Single-Cell Whole-Genome Amplification and Sequencing ...harvard.sunneyxielab.org/papers/2015_Huang_Annu... · change. The single-cell methylome has been determined (19), but this

GG16CH04-Xie ARI 21 July 2015 11:16

amplification. MALBAC uses a thermally stable DNA polymerase with strand displacementactivity. This important preamplification stage is then followed by exponential amplification of thefull amplicons by PCR, generating the amount of DNA required for next-generation sequencing.

MALBAC is not a mere combination of DOP-PCR and MDA, but is fundamentally differentbecause of its quasi-linear, as opposed to exponential, amplification. This results in two majoradvantages: accuracy for CNV detection and a low false negative rate for SNV detection.

MALBAC is not free from sequence-dependent bias. Unlike MDA, however, MALBAC’ssequence-dependent bias is reproducible along the genome from cell to cell. Therefore, signalnormalization for CNV noise reduction can be carried out. As shown below, after signal normal-ization with a reference cell, MALBAC offers the best CNV accuracy.

MALBAC exhibits the lowest false negative rates for SNV calling. However, MALBAC has ahigher false positive rate for SNV detection than MDA because the DNA polymerase currentlyused has a lower fidelity than the ϕ29 polymerase.

Characterization and Comparison of Whole-Genome Amplification Methods

It is necessary to provide a critical comparison of different WGA methods. Here we choose keyfeatures to compare, such as coverage, uniformity, reproducibility, allele dropout rate, false positiverate, chimera rate, and unmappable rate, as defined below. We compared the performance of fivecommercial single-cell WGA kits that use one of the three methods discussed above: the Sigma-Aldrich GenomePlex Single Cell Whole Genome Amplification Kit (DOP-PCR), the QiagenREPLI-g Single Cell Kit (MDA), the General Electric (GE) illustra Single Cell GenomiPhiDNA Amplification Kit (MDA), the Yikon Genomics Single Cell Whole Genome AmplificationKit (MALBAC), and the Rubicon Genomics PicoPLEX WGA Kit (MALBAC-like).

De Bourcy et al. (11) recently compared several WGA methods for single-cell analyses ofEscherichia coli. For mammalian cells, issues related to CNVs and heterozygosity of alleles aregermane to genomic analyses, which prompted us to perform a comparison of human cells. Wenote that human cancer cell lines often exhibit aneuploidy, which seriously complicates analysesof genome coverage and allele dropout rate. Here, we chose a diploid human cell line, BJ primaryhuman foreskin fibroblast. The lack of aneuploidy of this cell line makes it an ideal system tocharacterize amplification performance.

To perform this comparison, we amplified several single cells and sequenced to exclude cellsin metaphase (in which DNA is replicated). The amplified DNA products from each cell werefragmented by sonication to ∼250 base pairs. The sequencing libraries were prepared by using theNEBNext Ultra DNA Library Prep Kit before sequencing on an Illumina HiSeq X Ten platformwith 150-base paired-end reads.

Table 1 summarizes our characterization of key parameters that are defined and discussedbelow for the five tested kits for DOP-PCR, MDA, and MALBAC. Below, we define the parametersused in the comparison and present and discuss the results.

Coverage. At this point there has not yet been a complete human genome sequence—even thereference genome has gaps. In comparing the genome coverage of single cells with different WGAkits, we used the bulk sequencing without amplification at 30× depth as the reference and assumedit has 100% coverage.

The comparison was done using the total raw data of 80–100 Gb (30×) for each cell (Table 1).The single-cell sequencing data from certain kits have a large fraction of unmappable reads,which is of course undesirable. Comparison at identical sequencing depths would be impractical

84 Huang et al.

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 201

5.16

:79-

102.

Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

g A

cces

s pr

ovid

ed b

y H

arva

rd U

nive

rsity

on

09/0

1/15

. For

per

sona

l use

onl

y.

Page 7: Single-Cell Whole-Genome Amplification and Sequencing ...harvard.sunneyxielab.org/papers/2015_Huang_Annu... · change. The single-cell methylome has been determined (19), but this

GG16CH04-Xie ARI 21 July 2015 11:16

Table 1 Comparison of key parameters of five single-cell whole-genome amplification kits

KitWGA

methodRawdata Coverage CV Reproducibility ADO FPR CR UMR

Sigma-Aldrich

DOP-PCR 79 Gb 39% 0.14 0.93 76% 9.6 × 10−4 15% 64%

Qiagen MDA 78 Gb 84% 0.21 0.68 33% 1.3 × 10−4 2% 18%GE MDA 105 Gb 82% 0.17 0.31 38% 8.2 × 10−5 3% 44%Yikon MALBAC 96 Gb 72% 0.10 0.87 21% 3.8 × 10−4 5% 22%Rubicon MALBAC-

like80 Gb 52% 0.13 0.98 28% 2.4 × 10−4 13% 66%

Kits: Sigma-Aldrich, GenomePlex Single Cell Whole Genome Amplification Kit; Qiagen, REPLI-g Single Cell Kit; GE, illustra Single Cell GenomiPhiDNA Amplification Kit; Yikon, Single Cell Whole Genome Amplification Kit; Rubicon, PicoPLEX WGA Kit. Abbreviations: ADO, allele dropout; CR,chimera rate; CV, coefficient of variation; DOP-PCR, degenerate oligonucleotide–primed polymerase chain reaction; FPR, false positive rate; MALBAC,multiple annealing and looping–based amplification cycles; MDA, multiple displacement amplification; UMR, unmappable rate; WGA, whole-genomeamplification.

because of the high cost. Considering that the coverage is dependent on the actual sequencingdepth, Figure 4 gives the genome coverage of the single cells referenced to the bulk as a functionof the effective sequencing depth.

The low effective sequencing depth for the Sigma-Aldrich DOP-PCR kit was mainly due tothe high number of universal adapter reads. For Rubicon’s MALBAC-like kit, in addition to theprimer and adapter reads, the high number of unmappable reads includes large contributionsfrom the small DNA fragments (<30 base pairs) inserted into the sequencing adapters, whichwere too short to be mapped to the human reference genome. The coverage of Qiagen’s MDA kitis higher than that of Yikon’s MALBAC kit, which may result from MDA’s lack of reproducibilityof the sequence-dependent bias for the two different alleles in the diploid cell (36). Suffice it to say

0

20

40

60

80

100

0 10 20

Effective sequencing depth

Cove

rage

(%)

30 40

Sigma-Aldrich

Qiagen

GE

Yikon

Rubicon

Bulk

Figure 4Genome coverage versus effective sequencing depth for the five commercial single-cell whole-genomeamplification kits.

www.annualreviews.org • Single-Cell Whole-Genome Amplification and Sequencing 85

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 201

5.16

:79-

102.

Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

g A

cces

s pr

ovid

ed b

y H

arva

rd U

nive

rsity

on

09/0

1/15

. For

per

sona

l use

onl

y.

Page 8: Single-Cell Whole-Genome Amplification and Sequencing ...harvard.sunneyxielab.org/papers/2015_Huang_Annu... · change. The single-cell methylome has been determined (19), but this

GG16CH04-Xie ARI 21 July 2015 11:16

that MDA and MALBAC kits have comparable levels of coverage, both of which are significantlyhigher than the coverage of DOP-PCR.

Uniformity. Uniform amplification is important for accurate measurements of CNV. Figure 5ashows the raw read density of 23 chromosomes, clearly illustrating the sequence-dependent biasalong the genome. The variation is at the 1,000-kb scale.

If the variation along the genome is the same from cell to cell, then a normalization factor canbe calculated by averaging the read density of several cells in each bin and using this to normalizethe single-cell data. If the purpose of the investigation is to measure CNV, then reference cellswithout aneuploidy should be chosen. Figure 5b shows the data normalized by the average ofthree cells with no aneuploidy.

We characterize the uniformity by coefficients of variation of the read density (the genome-wide standard deviation divided by the mean). The normalization is done by the average of a fewreference cells (as in Figure 5b) in a certain bin size (1,000 kb). Figure 6a shows the uniformitycomparison of the five kits.

It is evident from Figure 5a that, of the five methods, DOP-PCR gives the flattest CNV rawdata without normalization. MDA creates variations along the genome that are not reproduciblefrom cell to cell and cannot be smoothed via normalization. MALBAC’s sequence-dependent biasis reproducible from cell to cell, giving the flattest CNV after normalization.

Reproducibility. Reproducibility is important in single-cell measurements, and measurementnoise needs to be minimized. We use Pearson’s cross-correlation coefficient of the read densitiesalong the genome between two identical cells to characterize the reproducibility.

Figure 6b shows our measurements of the cell-to-cell reproducibility of a WGA method by thecross-correlation coefficients for different WGA methods. It is apparent that the reproducibilityof DOP-PCR and MALBAC kits is better than that of MDA methods that exhibit stochasticityin WGA.

Allele dropout rate. The allele dropout rate is one of the most important characteristics of WGA,particularly for medical applications. Allele dropout arises from uneven WGA, which needs to beimproved. If a diploid cell has a heterozygous mutation, the lack of amplification in one of the twoalleles causes allele dropout (Figure 7a). Allele dropout is the primary cause of false negatives ofSNV calling. The allele dropout rate is measured by the ratio of the undetected and the actualheterozygous SNVs in a single cell. The latter is often approximated by the bulk measurement ofidentical cells.

Figure 6c shows our measurements of the allele dropout rates of the five kits. It is evident thatthe MALBAC methods have lower allele dropout rates than the other WGA methods.

False positive rate. A false positive of base calling can arise from either a sequencing error oran amplification error. Whereas random sequencing errors can be avoided with high sequencing

−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−→Figure 5Read density of reads from single diploid human cells with no aneuploidy from the BJ primary human foreskin fibroblast cell line, usingthe five commercial single-cell whole-genome amplification kits. (a) Raw data of single cells at a sequencing depth of 2× with a bin sizeof 1,000 kb, mapped to the human reference genome. The chromosomes are shown in alternating red and blue colors. (b) The samesingle-cell data normalized by the mean of three identical cells in each bin, with corresponding coefficients of variation (CVs) reflectinguniformity.

86 Huang et al.

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 201

5.16

:79-

102.

Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

g A

cces

s pr

ovid

ed b

y H

arva

rd U

nive

rsity

on

09/0

1/15

. For

per

sona

l use

onl

y.

Page 9: Single-Cell Whole-Genome Amplification and Sequencing ...harvard.sunneyxielab.org/papers/2015_Huang_Annu... · change. The single-cell methylome has been determined (19), but this

GG16CH04-Xie ARI 21 July 2015 11:16

0246 0246 0246 0246 0246

12

34

56

78

91

01

11

21

31

41

51

61

71

81

92

02

12

2X

Y

Sigm

a-A

ldri

chSi

gma-

Ald

rich

Qia

gen

Qia

gen

GE

GE

Yiko

nYi

kon

Rubi

con

Rubi

con

0246 0246 0246 0246 0246

12

34

56

78

91

01

11

21

31

41

51

61

71

81

92

02

12

2X

Y

Sigm

a-A

ldri

ch, C

V =

0.1

4Si

gma-

Ald

rich

, CV

= 0

.14

a b Qia

gen,

CV

= 0

.21

Qia

gen,

CV

= 0

.21

GE,

CV

= 0

.17

GE,

CV

= 0

.17

Yiko

n, C

V =

0.1

0Yi

kon,

CV

= 0

.10

Rubi

con,

CV

= 0

.13

Rubi

con,

CV

= 0

.13

Chro

mos

ome

Chro

mos

ome

www.annualreviews.org • Single-Cell Whole-Genome Amplification and Sequencing 87

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 201

5.16

:79-

102.

Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

g A

cces

s pr

ovid

ed b

y H

arva

rd U

nive

rsity

on

09/0

1/15

. For

per

sona

l use

onl

y.

Page 10: Single-Cell Whole-Genome Amplification and Sequencing ...harvard.sunneyxielab.org/papers/2015_Huang_Annu... · change. The single-cell methylome has been determined (19), but this

GG16CH04-Xie ARI 21 July 2015 11:16

0.05

0.00

0.10

0.15

0.20

0.25

Coeffi

cien

t of v

aria

tion

0.2

0.0

0.4

0.6

0.8

1.0

Repr

oduc

ibili

ty

2.0 × 10–4

0.0

4.0 × 10–4

6.0 × 10–4

8.0 × 10–4

1.0 × 10–3

1.2 × 10–3

Fals

e po

siti

ve ra

te

20%

0%

40%

60%

80%

100%

Sigma-Aldrich

Qiagen GE Yikon Rubicon Sigma-Aldrich

Qiagen GE Yikon Rubicon

Sigma-Aldrich

Qiagen GE Yikon Rubicon Sigma-Aldrich

Qiagen GE Yikon Rubicon

Sigma-Aldrich

Qiagen GE Yikon Rubicon Sigma-Aldrich

Qiagen GE Yikon Rubicon

Unm

appa

ble

rate

20%

0%

40%

60%

80%

100%

Alle

le d

ropo

ut ra

te

5%

0%

10%

15%

20%

Chim

era

rate

a Coefficient of variation b Reproducibility

c Allele dropout rate d False positive rate

e Chimera rate f Unmappable rate

Figure 6Comparison of key parameters of the five commercial single-cell whole-genome amplification kits, usingdata generated from three single cells from the BJ primary human foreskin fibroblast cell line. (a) Coefficientof variation, calculated as the ratio of the standard deviation of read density to its average in three individualcells. (b) Reproducibility, measured by Pearson’s cross-correlation coefficient of the read densities betweentwo identical cells. (c) Allele dropout rate, defined as the percentage of homozygous sites in the single-cellsamples where the bulk sample is heterozygous at the same nucleotide site. (d ) False positive rate, defined asthe number of heterozygous site calls in the single cell divided by the number of sites in the bulk sample thatare homozygous or have a different allele at the same nucleotide site. (e) Chimera rate, defined as the numberof reads that are improperly connected (including abnormal fragment size and interchromosomalconnection) divided by the total number of mappable reads. ( f ) Fraction of reads that are unmappable to thereference genome.

depth, WGA errors often dominate, occur during the first or second cycle of exponential am-plification, and are dependent on the error rate of the DNA polymerase (Figure 7b). Figure 6dcompares the false positive rates of the WGA kits. The Qiagen and GE kits perform better thanthe MALBAC approaches, mainly because of the higher accuracy of the φ29 polymerase in theformer.

88 Huang et al.

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 201

5.16

:79-

102.

Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

g A

cces

s pr

ovid

ed b

y H

arva

rd U

nive

rsity

on

09/0

1/15

. For

per

sona

l use

onl

y.

Page 11: Single-Cell Whole-Genome Amplification and Sequencing ...harvard.sunneyxielab.org/papers/2015_Huang_Annu... · change. The single-cell methylome has been determined (19), but this

GG16CH04-Xie ARI 21 July 2015 11:16

Allele dropout

A

C

A

False positives

A

A

C*

A

Two alleles

in a single cell

Two alleles

in a single cell

Chimera formation

Two separate DNA

segments in a single cell

a b c

Figure 7Schematics for three types of errors that can arise in whole-genome amplification. (a) Allele dropout. Thelack of amplification of only one of the two alleles makes a heterozygous point mutation appear to be ahomozygous one, causing false negative errors. (b) False positive errors arising from point mutationsgenerated by the whole-genome amplification. (c) Chimera formation, in which different parts of the originalgenome are artificially connected by whole-genome amplification.

Chimera rate. Chimera formation is an artifact of WGA that generates reads that can bemapped to different parts of the genome that are not physically linked (Figure 7c). It prevents theidentification of true structural variations, and single-cell structural variations are still challengingto detect owing to this technical problem. The chimera rate is defined as the percentage of readsthat exhibit improper connections in the reference genome, i.e., those that show an abnormalfragment size or indicate interchromosomal connections. Figure 6e shows the chimera rates ofthe WGA kits. The chimera rates of the Sigma-Aldrich and Rubicon kits are much higher thanthose of the other kits.

Unmappable rate. The unmappable rate is the probability that junk sequences are generated inthe WGA process, which can arise from the formation of primer dimers, short DNA fragments,and/or nonspecific incorporations. A large fraction of unmappable reads reduces the cost effective-ness of genome sequencing and the completeness of the genome coverage. Figure 6f shows the un-mappable rates of the WGA kits. The Qiagen and Yikon kits showed the lowest unmappable rates.

We would like to point out that all of the comparisons described above were done using theexisting kits available to us and that each would improve with time. As important as the specificresults is how we performed the comparison. It is fair to say that each method or kit has its ownadvantages and disadvantages, and that selection should be based on specific applications.

For accurate CNV determination, DOP-PCR and MALBAC kits offer the highest repro-ducibility and lowest coefficients of variation. MALBAC kits and Qiagen’s MDA kit are well suitedfor simultaneous measurements of SNVs and CNVs. For calling de novo SNV, the false negativeand false positive rates need to be considered. Yikon’s MALBAC kit has the lowest false negatives(the allele dropout rate), while Qiagen’s MDA kit has the lowest false positive rate. Whensequencing cost effectiveness is considered, a low unmappable rate is desirable for high effectivecoverage.

www.annualreviews.org • Single-Cell Whole-Genome Amplification and Sequencing 89

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 201

5.16

:79-

102.

Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

g A

cces

s pr

ovid

ed b

y H

arva

rd U

nive

rsity

on

09/0

1/15

. For

per

sona

l use

onl

y.

Page 12: Single-Cell Whole-Genome Amplification and Sequencing ...harvard.sunneyxielab.org/papers/2015_Huang_Annu... · change. The single-cell methylome has been determined (19), but this

GG16CH04-Xie ARI 21 July 2015 11:16

We performed our comparison using small sample tubes instead of microfluidic devices; the per-formance parameters might change under microfluidic conditions and might be different for bacte-rial (small) and human (large) genomes. Although expensive, a similar comprehensive comparisonof the chemistries for human genomes would be worth doing under microfluidic conditions.

STUDY OF GERM CELLS

Phasing of Human Genomes with Single-Cell Genomics

One central goal of human genetics is to characterize the genetic variations of individual hu-man beings and study their correlations with function-related traits (1, 44). Human genomes arediploid in nature, and human genes often contain multiple coding and regulatory elements withvariations among different populations. Different phenotypic consequences can often arise de-pending on whether the genetic variations are associated on the same chromosome (in cis) or onopposite homolog chromosomes (in trans) (2, 62). One example is the phenomenon of compoundheterozygosity (28, 66), in which the two heterozygous nonsynonymous mutations in a certaingene could result in either one normal version (as in cis) or no normal version (as in trans) of thegene. A complete understanding of genetic variations and their consequences in human diseasescannot be obtained without resolving the combination of genetic variations at different loci onthe same chromosome—that is, the haplotypes (16, 29). Without the haplotype information, thedescription of individual human genomes is incomplete, and the functional interpretation can beerror prone.

However, owing to the limited read lengths of the currently available sequencing techniques,it is often challenging to separate the two homolog chromosomes in genetic analyses to obtain theassociation information between the genetic variations. Molecular cloning technologies have beendeveloped for phasing individual genomes (33, 38, 59, 76). However, the cloning procedures areoften lengthy and labor intensive and therefore may not be scalable with the increasing demandfor individual genome sequencing.

Single-cell genome sequencing provides an alternative way to obtain phasing information onhuman individuals. Individual metaphase chromosomes have been isolated from actively dividingcells by techniques such as laser microdissection (41), microfluidics (14), or flow cytometry (74)before being whole-genome amplified and genotyped to obtain the association information forgenetic variations on individual chromosomes. Peters et al. (50) further developed a low-cost DNAhaplotyping process by diluting genomic DNA into 384 wells, each containing subcellular-levelgenomic DNA. DNA molecules from each well were then separately whole-genome amplifiedand sequenced to obtain the phasing information. Hou et al. (24), Kirkness et al. (32), and Lu et al.(40) used whole-genome-amplified individual germ cells for haplotype analyses, further increas-ing the haplotype block size to the chromosome-spanning level (Figure 8). These two categoriesof haplotyping approaches do not require molecular cloning, cell culturing, or sophisticated in-strumentation for chromosome isolation, potentially enabling haplotype-resolved comprehensivegenetic studies and clinical applications in the near future.

Study of Meiotic Recombination in Human Germ Cells

Meiotic recombination is essential to the proper segregation of homolog chromosomes, whichresults in the exchange of genetic information through crossover events and creates diversityfor evolution (8). Abnormality in generating crossovers of homolog chromosomes is the leading

90 Huang et al.

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 201

5.16

:79-

102.

Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

g A

cces

s pr

ovid

ed b

y H

arva

rd U

nive

rsity

on

09/0

1/15

. For

per

sona

l use

onl

y.

Page 13: Single-Cell Whole-Genome Amplification and Sequencing ...harvard.sunneyxielab.org/papers/2015_Huang_Annu... · change. The single-cell methylome has been determined (19), but this

GG16CH04-Xie ARI 21 July 2015 11:16

AA CC AA CC

AA AA GG

CC GG CC

AA TT AA GG

AA CC AA xx CC AA

TT TT CC AA

TT TT CC

TT GG TT AA

A C A G C

T G T C A

Unphased personal genome

Single sperm

Sperm 1

Sperm 8

Sperm 2

Sperm 3

Sperm 4

Sperm 5

Sperm 6

Sperm 7

Phased genome

A/T C/G A/T C/G A/C

Figure 8Phasing the genome of an individual using the single-nucleotide variation (SNV) linkage information fromhis single sperm cells. The diploid genome was sequenced and found to contain five heterozygous single-nucleotide polymorphisms (SNPs), with unknown linkage information shown in purple. Individual spermcells were then sequenced via whole-genome amplification using the multiple annealing and looping–basedamplification cycles (MALBAC) method, from which SNP linkage information in each sperm was used tophase the individual’s diploid genome. The light blue “T” is a whole-genome amplification or sequencingerror; the black “x” represents a crossover in recombination, the switching point between the paternal andmaternal DNA. Figure adapted from Reference 40.

cause of miscarriage and birth defects (13). Population analyses such as linkage disequilibrium andpedigree studies are widely used in studying the highly uneven pattern of meiotic recombinationacross the human genome. However, population analyses yield results that are averaged amongindividuals and affected by evolutionary pressures, therefore often masking the quickly evolvingor individual-specific features of the recombination-active regions (23, 30, 34).

Single-germ-cell genome sequencing offers a novel approach for studying meiotic recombi-nation at the level of individual human beings. Each germ cell is unique in genome constructionbecause of the differences in recombination, and sequencing multiple single germ cells froman individual allows investigators to map individual recombination events and study germlinegenome instability. Single-cell studies have been published using donor sperm (40, 68) as well asoocytes (24) for mapping recombination in individual human beings. Wang et al. (68) utilized amicrofluidic device to separate and sequence single sperm, thereby mapping the recombinationdistributions and studying the gene conversion and de novo mutation events of an individual. Luet al. (40) sequenced ∼100 single sperm cells using the MALBAC method and mapped the recom-bination events of an individual at high resolution (Figure 9). This study revealed the correlationof a decreased crossover frequency with an increase of autosomal aneuploidy rate from humansperm, recapitulating the importance of meiotic crossover in maintaining genome stability in germcells.

www.annualreviews.org • Single-Cell Whole-Genome Amplification and Sequencing 91

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 201

5.16

:79-

102.

Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

g A

cces

s pr

ovid

ed b

y H

arva

rd U

nive

rsity

on

09/0

1/15

. For

per

sona

l use

onl

y.

Page 14: Single-Cell Whole-Genome Amplification and Sequencing ...harvard.sunneyxielab.org/papers/2015_Huang_Annu... · change. The single-cell methylome has been determined (19), but this

GG16CH04-Xie ARI 21 July 2015 11:16

0 100

0 1000 100

0 100

0 100

0 100

0 1000 100

0 100 0 100 0 100

0 1000 100

0 100

0 100

0 100 0 100

0 1000 100

0 100 0 100

0 100

0 100

Paternal haplotype

Maternal haplotype

Unresolved haplotype

Unassembled gap

Centromere region

Crossover

Percentage of reads covering paternal SNPs

Percentage of reads covering maternal SNPs

1 2 3 4 5 6 7 8 9 10 11

Chromosome12 13 14 15 16 17 18 19 20 21 22 X Y

Figure 9Crossovers between paternal ( green) and maternal (red ) DNA in the 23 chromosomes of a sperm cell. Figure adapted fromReference 40.

Hou et al. (24) utilized the first and the second polar bodies of individual oocytes to determinethe crossover maps of female individuals and to study the chromatid interference effect in oocytes.This study also utilized the information of the two polar bodies to deduce the genomes of the oocytepronuclei, enabling noninvasive preimplantation genetic diagnosis (PGD) and preimplantationgenomic screening (PGS) for in vitro–fertilized embryos (Figure 10).

Preimplantation Genetic Diagnosis and Genomic Screeningin In Vitro Fertilization

In vitro fertilization is a technique of assisted reproductive technology by which an egg is fertilizedby sperm outside the human body. The fertilized egg is cultured in vitro for 2–6 days before beingimplanted into the uterus with the intention of creating a successful pregnancy. Nucleic acid

−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−→Figure 10Deduction of viability for preimplantation genomic screening for in vitro fertilization. (a) A human oocyte with homologousrecombination between the paternal (blue) and maternal (red ) DNA (only one chromosome is shown). Upon fertilization with a spermcell, the first and second polar bodies, which are dispensable for embryo development, can be safely biopsied by a micropipette, subjectto whole-genome amplification with multiple annealing and looping–based amplification cycles (MALBAC). By sequencing the twopolar bodies or a few cells in the blastocyst stage of the embryo, one can avoid chromosome abnormality (middle case) as well asundesirable point mutations from a disease-carrying parent (right case) and increase in vitro fertilization’s successful rate for healthybabies (left case). (b) Deduction of the copy number of chromosomes from a female pronucleus based on its two polar bodies. Thededuction is based on the fact that the total number of four chromatids is conserved, which was verified by sequencing the femalepronucleus with the donor’s consent. Panel adapted from Reference 24.

92 Huang et al.

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 201

5.16

:79-

102.

Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

g A

cces

s pr

ovid

ed b

y H

arva

rd U

nive

rsity

on

09/0

1/15

. For

per

sona

l use

onl

y.

Page 15: Single-Cell Whole-Genome Amplification and Sequencing ...harvard.sunneyxielab.org/papers/2015_Huang_Annu... · change. The single-cell methylome has been determined (19), but this

GG16CH04-Xie ARI 21 July 2015 11:16

0

1

2

3

4

0

1

2

3

4

0

1

2

3

4

0

1

2

3

4

0

1

2

3

4

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X

First polar body

Second polar body

Total chromatid copy numbers conserved in the whole oocyte

Female pronucleus predicted

Female pronucleus confirmed

Male pronucleus

Polar body 1

Chr1

Sperm

Fertilization

C

C

Polar body 2 Polar body 1 Polar body 2 Polar body 1 Polar body 2

Female pronucleus

Paternal haplotype

Maternal haplotype

Healthy baby Birth difficulty or

baby with genetic disorder

Baby with dominant

disease allele

HUMAN OOCYTE

A

A

A

a

b

Chromosome

AAAAA CC C

www.annualreviews.org • Single-Cell Whole-Genome Amplification and Sequencing 93

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 201

5.16

:79-

102.

Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

g A

cces

s pr

ovid

ed b

y H

arva

rd U

nive

rsity

on

09/0

1/15

. For

per

sona

l use

onl

y.

Page 16: Single-Cell Whole-Genome Amplification and Sequencing ...harvard.sunneyxielab.org/papers/2015_Huang_Annu... · change. The single-cell methylome has been determined (19), but this

GG16CH04-Xie ARI 21 July 2015 11:16

materials obtained from the in vitro–fertilized embryos can be used to perform PGD and PGS toavoid the inheritance of pathogenic mutations and chromosome abnormalities, respectively.

PGD/PGS can be performed on the two polar bodies of the oocytes, one of the blastomeric cellsfrom day-3 embryos, or several trophectoderm cells from day-5 blastocyst embryos, in which onlyone or a few cells can be obtained for PGD/PGS purposes. Performing WGA on these materialsenables comprehensive chromosome analyses on various genome analytical platforms, such asa comparative genomic hybridization array (52), single-nucleotide polymorphism (SNP) array(63), or multiplex quantitative PCR (65). The rapid development of high-throughput sequencingtechniques further reduced the cost and increased the precision and resolution of chromosome-level PGD/PGS (24, 26, 71). Treff et al. (64) utilized targeted high-throughput sequencing toperform PGD of monogenic disease, and efforts were made to perform both point mutationPGD and chromosomal PGS in the same embryo (10, 58). By using MALBAC next-generationsequencing, we were able to report a fully integrated pipeline for combined monogenic PGD andchromosomal PGS and a live birth free of a specific pathogenic mutation carried by the child’sparents (22). We envision that the advancement of WGA techniques will enable clinical trials withfully integrated pipelines for combined monogenic PGD and chromosomal PGS in the near future.

STUDY OF GENOME EVOLUTION IN CANCER

Cancer is a genomic disease (67). Recent advances in single-cell WGA methods have alloweddetermination of CNVs at a large scale (48, 70) and at the single-gene level (15) as well as SNVsof a single cancer cell (18, 70, 75), in primary tissue, or in CTCs in blood (9, 39, 49).

A pair of companion studies used single-cell WGA and exome sequencing to examine tumorheterogeneity (25, 73). These studies found hundreds of SNVs—which differ from cell to cell,revealing the genetic complexity of these tumors—and applied population genetic analysis to studytheir clonal composition. However, our independent analysis of these data indicates that the vastmajority of these SNVs are due to sequencing artifacts or possibly contaminations, and only ahandful, if any, may be attributed to true heterogeneity of the tumor samples (A. Chapman, C.Zong & X.S. Xie, unpublished results). Of the 711 tumor-specific SNVs reported in one of thestudies (25), we found that 42% of them were actually present at a lower level in the normaltissue as well. These are likely to be SNPs that were mistakenly not identified in the normal tissue(false negatives) rather than tumor-specific SNVs. Furthermore, 58% of the remaining SNVsreported were present in dbSNP, a database of germline mutations known to occur in the humanpopulation. Because new mutations in a tumor arise randomly, it would be highly unlikely that asignificant number of them happen to coincide with known mutations present in the general pop-ulation. Instead, an alternative explanation for these mutations could be contaminations from theoperators or other human sources. Indeed, we observed that of those remaining SNVs not foundin dbSNP, 84% were found in unrelated samples—either normal control cells from the same studyor cells from the companion study. In summary, more than 96% of the SNVs identified as beingtumor specific in that report (25) appear likely to be contaminants or sequencing artifacts.

We now give a few examples of new information available from single-cell genomics.

Measurement of Spontaneous Whole-Genome Mutation Rate

In calling SNVs from a single cell by WGA (24, 68, 73), one challenge is high false negative ratescaused by allele dropout, and another is false positives associated with amplification and sequencingerrors (either random or systematic) (42). Figure 11 shows 35 unique SNVs in a human cancer cellline (SW480) newly acquired during 20 cell divisions. Adjusting for a 70% detection efficiency for

94 Huang et al.

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 201

5.16

:79-

102.

Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

g A

cces

s pr

ovid

ed b

y H

arva

rd U

nive

rsity

on

09/0

1/15

. For

per

sona

l use

onl

y.

Page 17: Single-Cell Whole-Genome Amplification and Sequencing ...harvard.sunneyxielab.org/papers/2015_Huang_Annu... · change. The single-cell methylome has been determined (19), but this

GG16CH04-Xie ARI 21 July 2015 11:16

1

22 21 20 19 18 17 16 15 14 13 X Y

2

34

56

78 9 10 11 12

G

Chromosome 2: 57,736,349

GGGGG

GGG

G

GGGGGGGG

GG

G

G

GGGG

GGGGG

G

G

G

G

GG

a b

c

C1

C1 C2

MALBAC

Whole-genomesequencing

C3 C4 C5 C6

Kindredsingle cells

Bulk DNA

Singleancestor cell

~20 generations

C2 C3 Bulk

Figure 11Detecting newly acquired single-nucleotide variations (SNVs) with no false positives and estimation of the mutation rate of a humancancer cell line (SW480). (a) Experimental design. A single ancestor cell was chosen and cultured for ∼20 generations. The vastmajority of cells were used to extract DNA for bulk sequencing to represent the ancestral cell’s genome. A single cell from this culturewas chosen for another expansion of four generations. The kindred cells were isolated for single-cell whole-genome amplification.Single-cell samples C1, C2, and C3 were used for high-throughput sequencing. (b) Locations of the 35 newly acquired SNVs on thechromosomes of a single cell. (c) Next-generation sequencing data of a newly acquired SNV. The SNV (C→G) existed in thehigh-throughput data of all three kindred cells but not in the bulk data. Additional abbreviation: MALBAC, multiple annealing andlooping–based amplification cycles. Figure adapted from Reference 78.

heterozygous SNVs, Zong et al. (78) estimated that ∼49 mutations occurred in the 20 generations,yielding a mutation rate of ∼2.5 nucleotides per cell generation, which is consistent with theestimate based on the bulk data.

At this rather slow spontaneous mutation rate, cancer development would take a long time.What happens at the very beginning of cancer development has been a long-standing question,and single-cell WGA is well suited to help answer it.

Does Copy-Number Variation Precede Single-Nucleotide Variation?

In a bulk tissue sample at an early stage of cancer development, detection of abnormal CNVsis often difficult, especially when the number of cells with abnormal CNVs is small. Single-cellgenomic analysis is essential in evaluating the relationship, if any, between CNVs and SNVs.

www.annualreviews.org • Single-Cell Whole-Genome Amplification and Sequencing 95

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 201

5.16

:79-

102.

Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

g A

cces

s pr

ovid

ed b

y H

arva

rd U

nive

rsity

on

09/0

1/15

. For

per

sona

l use

onl

y.

Page 18: Single-Cell Whole-Genome Amplification and Sequencing ...harvard.sunneyxielab.org/papers/2015_Huang_Annu... · change. The single-cell methylome has been determined (19), but this

GG16CH04-Xie ARI 21 July 2015 11:16

Hyperproliferation Adenomatouspolyps

High-grade dysplasia Adenocarcinoma Invasive cancer

Abnormalcell growth

Cell 1Cell 1

Cell 2Cell 2

Cell 3Cell 3

Cell 4Cell 4

Cell 5Cell 5

Cell 6Cell 6

Cell 7Cell 7

Cell 8Cell 8

01234

01234

01234

01234

01234

01234

01234

01234

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y

Cell 1

Cell 2

Cell 3

Cell 4

Cell 5

Cell 6

Cell 7

Cell 8

RefSeqKCNB1

G

S V I S

G T C G T G C

47,900,760 base pairs

Chromosome

a

b

c

dA A A

01234

01234

01234

01234

01234

01234

01234

01234

Chromosome 5

Cell 1 Cell 1

Cell 2 Cell 2

Cell 3 Cell 3

Cell 4 Cell 4

Cell 5 Cell 5

Cell 6 Cell 6

Cell 7 Cell 7

Cell 8 Cell 8

96 Huang et al.

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 201

5.16

:79-

102.

Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

g A

cces

s pr

ovid

ed b

y H

arva

rd U

nive

rsity

on

09/0

1/15

. For

per

sona

l use

onl

y.

Page 19: Single-Cell Whole-Genome Amplification and Sequencing ...harvard.sunneyxielab.org/papers/2015_Huang_Annu... · change. The single-cell methylome has been determined (19), but this

GG16CH04-Xie ARI 21 July 2015 11:16

We have carried out single-cell genomic analyses of colonoscopy biopsies at different adenomastages (L. Huang, S. Ding & X.S. Xie, unpublished results). Some single cells in stage II adenomahave CNVs in the tumor suppresser gene APC (a reduction in copy number from two to one) aswell as SNVs in numerous cancer-related genes. Interestingly, we found that although some singlecells exhibited CNV reduction in APC without SNVs, all single cells with SNVs that have beenreported as somatic mutations of colon and other cancers in the Catalogue of Somatic Mutationsin Cancer (COSMIC) database showed the CNV reduction in APC. Moreover, we did not see anysingle cell that has colon cancer–related SNVs but no CNVs in an adenoma. Figure 12 showssuch data for the COSMIC gene KCNB1, which encodes an ion channel. These data indicate thatthe CNV in APC precedes the SNV in colon cancer development, at least in the particular colonyexamined.

Furthermore, we found that the single cells from the same adenoma exhibited CNV patternsreproducible among all the cells, indicating that these cells might be derived from a single stem orprogenitor cell in which the CNVs first arose. Thus, we have established a correlation between theSNVs and CNVs and propose that the SNVs are generated as a consequence of abnormal CNVsin the genome, or arise after the abnormal CNVs are acquired in the original populating cell. Ifconfirmed to be general for driving SNVs, this result could have significant implications for thegenesis of cancer. Suffice it to say that this experiment underscores the importance of single-cellgenomics in understanding cancer.

Circulating Tumor Cells

Originating from primary tumors, CTCs enter peripheral blood and seed metastases, which ac-count for 90% of cancer-related deaths. The genome sequencing of CTCs could offer noninvasiveprognosis or even diagnosis but has been hampered by low single-cell genome coverage of scarceCTCs.

Ni et al. (49) applied MALBAC for WGA of single CTCs from lung cancer patients and ob-served characteristic cancer-associated SNVs and indels in exomes of CTCs. These mutationsprovided information needed for individualized therapy (53), such as drug resistance and pheno-typic transition, but were heterogeneous from cell to cell.

Ni et al. (49) also discovered that, unlike the highly heterogeneous point mutations, theCNV patterns of CTCs are reproducible from CTC to CTC within a patient and even withindifferent patients of the same cancer type, but are distinctly different among different cancer types(Figure 13). Furthermore, the reproducible CNV patterns of CTCs are similar to those of themetastatic tumor. This result raised intriguing questions about the genesis of metastasis. It isevident that gains and losses in copy numbers of certain chromosome regions are selected for

←−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−Figure 12Temporal sequence of copy-number variations (CNVs) and single-nucleotide variations (SNVs) of colon polyps. (a) Normal intestinalepithelium and adenoma. (b) CNVs detected in eight single cells (3,000-kb bin size). Some cells exhibit changes in chromosome 5 copynumber from two to one. This is an ∼8-Mb region. (c) CNVs seen when zooming into chromosome 5 (1,000-kb bin size). A reductionof copy number from two to one is apparent in the 5q region containing the APC-coding gene (chromosomal region 5q21.3–23.1). Thesame region is seen for five different single cells, indicating that they came from the same stem cell. (d ) Four of eight single cells exhibitpoint mutations in the KCNB1 gene, which has been reported as a somatic mutation of colon cancer in the Catalogue of SomaticMutations in Cancer (COSMIC) database. Serine (S), valine (V), and isoleucine (I) are the amino acids translated by the triplet codon.The KCNB1 gene is a protein-coding gene that encodes an ion channel protein. Interestingly, SNVs occur only in cells with abnormalCNVs.

www.annualreviews.org • Single-Cell Whole-Genome Amplification and Sequencing 97

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 201

5.16

:79-

102.

Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

g A

cces

s pr

ovid

ed b

y H

arva

rd U

nive

rsity

on

09/0

1/15

. For

per

sona

l use

onl

y.

Page 20: Single-Cell Whole-Genome Amplification and Sequencing ...harvard.sunneyxielab.org/papers/2015_Huang_Annu... · change. The single-cell methylome has been determined (19), but this

GG16CH04-Xie ARI 21 July 2015 11:16

a

b

c

P1

7 20 2214 Y198Chromosome 1 116 17 2116 183 12 15 X42 9 13105

P1(ADC SCLC)

0246

P2(ADC) 0

246

P3(ADC) 0

246

P4(ADC) 0

246

P5(ADC) 0

246

P6(ADC) 0

246

P7ADC+SCLC

P7

_1

P7

_2

P1

_4

P1

_6

P1

_1

P1

_2

P1

_5

P1

_7

P1

_3

P1

_8

P3

_5

P3

_2

P3

_3

P3

_1

P3

_4

P2

_3

P2

_4

P2

_5

P2

_6

P2

_1

P2

_2

P6

_1

P5

_1

P5

_2

P4

_5

P4

_4

P4

_1

P4

_2

P4

_3

0246

7 20 2214 Y198Chromosome 1 116 17 2116 183 12 15 X42 9 13105

Primary

0246

Metastasis

0246

CTC 1

0246

CTC 2

0246

CTC 3

0246

CTC 4

0246

CTC 5

0246

CTC 6

0246

CTC 7

0246

CTC 8

02

46

0

200

500

98 Huang et al.

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 201

5.16

:79-

102.

Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

g A

cces

s pr

ovid

ed b

y H

arva

rd U

nive

rsity

on

09/0

1/15

. For

per

sona

l use

onl

y.

Page 21: Single-Cell Whole-Genome Amplification and Sequencing ...harvard.sunneyxielab.org/papers/2015_Huang_Annu... · change. The single-cell methylome has been determined (19), but this

GG16CH04-Xie ARI 21 July 2015 11:16

←−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−Figure 13Reproducible copy-number variation (CNV) patterns of seven lung cancer patients. Patient 1 (P1) experienced a phenotypic transitionfrom adenocarcinoma (ADC) in the lung to small-cell lung cancer (SCLC) in the metastatic liver, patients 2–6 (P2–6) have ADC, andpatient 7 (P7) has a mix of ADC and SCLC. (a) Reproducible CNV patterns of eight circulating tumor cells (CTCs) in P1. Thepatterns are different from that of the primary tissue ( first row) but the same as that of the metastatic tissue (second row). (b) CNVpatterns of CTCs from the seven patients. (c) Clustering analyses of CTCs based on the CNVs. CTCs from P1 and P7 (SCLC patients)were well separated from CTCs from P2–P6 (ADC patients). The y axis is the cluster distance constructed using Ward’s method basedon Euclidean distances between the patients’ CNVs. Figure adapted from Reference 49.

metastases. The finding that the CNV patterns of CTCs are cancer or tissue dependent offersthe potential for noninvasive cancer diagnosis based on the CNV patterns.

DISCLOSURE STATEMENT

S.L. and X.S.X. are coauthors on a patent applied for by Harvard University for MALBAC tech-nology and are cofounders of Yikon Genomics.

ACKNOWLEDGMENTS

We thank Chenghang Zong for his contribution to the invention of MALBAC. We thank YaqiongTang, Dandan Cao, and Hongshan Guo for technical assistance in data collection and GuangyuZhou for help with data analysis. We are indebted to our collaborators Fuchou Tang, Jie Qiao,Xiaohui Ni, Fan Bai, Wei Fan, Liya Xu, Jie Wang, Ning Zhang, Youyong Lu, Liying Yan, YuHou, Zhe Su, Yan Gao, Shigang Ding, Hejun Zhang, Ruiqiang Li, Mingyu Yang, Jinsen Li, JessicaSang, and Yanyi Huang for their contributions to the work reviewed in this article. We are gratefulfor funding from the National Natural Science Foundation of China (21327808), US NationalInstitutes of Health (NIH) National Human Genome Research Institute grants (HG005097-1 andHG005613-01), and an NIH Director’s Pioneer Award (5DP1CA186693) to X.S.X. as well as aNational Cancer Institute grant (5R33CA174560). The work at the Biodynamic Optical ImagingCenter (BIOPIC) has been supported by Peking University 985 special funding for collaborationwith hospitals and funding from Guangxi Wuzhou Zhongheng Group Co., Ltd.

LITERATURE CITED

1. 1000 Genomes Proj. Consort. 2010. A map of human genome variation from population-scale sequencing.Nature 467:1061–73

2. Bansal V, Tewhey R, Topol EJ, Schork NJ. 2011. The next phase in human genetics. Nat. Biotechnol.29:38–39

3. Barrett MT, Reid BJ, Joslyn G. 1995. Genotypic analysis of multiple loci in somatic cells by whole genomeamplification. Nucleic Acids Res. 23:3488–92

4. Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, et al. 2008. Accurate whole humangenome sequencing using reversible terminator chemistry. Nature 456:53–59

5. Blanco L, Bernad A, Lazaro JM, Martin G, Garmendia C, Salas M. 1989. Highly efficient DNA synthesisby the phage φ29 DNA polymerase: symmetrical mode of DNA replication. J. Biol. Chem. 264:8935–40

6. Chapman AR, He Z, Lu S, Yong J, Tan L, et al. 2015. Single cell transcriptome amplification withMALBAC. PLOS ONE 10:e0120889

7. Cheung VG, Nelson SF. 1996. Whole genome amplification using a degenerate oligonucleotide primerallows hundreds of genotypes to be performed on less than one nanogram of genomic DNA. PNAS93:14676–79

www.annualreviews.org • Single-Cell Whole-Genome Amplification and Sequencing 99

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 201

5.16

:79-

102.

Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

g A

cces

s pr

ovid

ed b

y H

arva

rd U

nive

rsity

on

09/0

1/15

. For

per

sona

l use

onl

y.

Page 22: Single-Cell Whole-Genome Amplification and Sequencing ...harvard.sunneyxielab.org/papers/2015_Huang_Annu... · change. The single-cell methylome has been determined (19), but this

GG16CH04-Xie ARI 21 July 2015 11:16

8. Coop G, Przeworski M. 2007. An evolutionary view of human recombination. Nat. Rev. Genet. 8:23–349. Dago AE, Stepansky A, Carlsson A, Luttgen M, Kendall J, et al. 2014. Rapid phenotypic and genomic

change in response to therapeutic pressure in prostate cancer inferred by high content analysis of singlecirculating tumor cells. PLOS ONE 9:e101777

10. Daina G, Ramos L, Obradors A, Rius M, Martinez-Pasarell O, et al. 2013. First successful double-factorPGD for Lynch syndrome: monogenic analysis and comprehensive aneuploidy screening. Clin. Genet.84:70–73

11. de Bourcy CF, De Vlaminck I, Kanbar JN, Wang J, Gawad C, Quake SR. 2014. A quantitative comparisonof single-cell whole genome amplification methods. PLOS ONE 9:e105585

12. Dean FB, Nelson JR, Giesler TL, Lasken RS. 2001. Rapid amplification of plasmid and phage DNA usingphi29 DNA polymerase and multiply-primed rolling circle amplification. Genome Res. 11:1095–99

13. Epstein CJ. 2007. The Consequences of Chromosome Imbalance: Principles, Mechanisms, and Models. Cambridge,UK: Cambridge Univ. Press

14. Fan HC, Wang J, Potanina A, Quake SR. 2011. Whole-genome molecular haplotyping of single cells.Nat. Biotechnol. 29:51–57

15. Francis JM, Zhang CZ, Maire CL, Jung J, Manzo VE, et al. 2014. EGFR variant heterogeneity inglioblastoma resolved through single-nucleus sequencing. Cancer Discov. 4:956–971

16. Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, et al. 2007. A second generation humanhaplotype map of over 3.1 million SNPs. Nature 449:851–61

17. Garmendia C, Bernad A, Esteban JA, Blanco L, Salas M. 1992. The bacteriophage φ29 DNA polymerase,a proofreading enzyme. J. Biol. Chem. 267:2594–99

18. Gawad C, Koh W, Quake SR. 2014. Dissecting the clonal origins of childhood acute lymphoblasticleukemia by single-cell genomics. PNAS 111:17947–52

19. Guo H, Zhu P, Wu X, Li X, Wen L, Tang F. 2013. Single-cell methylome landscapes of mouse embryonicstem cells and early embryos analyzed using reduced representation bisulfite sequencing. Genome Res.23:2126–35

20. Harris TD, Buzby PR, Babcock H, Beer E, Bowers J, et al. 2008. Single-molecule DNA sequencing of aviral genome. Science 320:106–9

21. Hastings PJ, Lupski JR, Rosenberg SM, Ira G. 2009. Mechanisms of change in gene copy number. Nat.Rev. Genet. 10:551–64

22. Heger M. 2014. Peking University reports first success in trial of single-cell sequencing for PGD.GenomeWeb, Oct. 1. http://www.genomeweb.com/sequencing/peking-university-reports-first-success-trial-single-cell-sequencing-pgd

23. Hinch AG, Tandon A, Patterson N, Song Y, Rohland N, et al. 2011. The landscape of recombination inAfrican Americans. Nature 476:170–75

24. Hou Y, Fan W, Yan L, Li R, Lian Y, et al. 2013. Genome analyses of single human oocytes. Cell 155:1492–506

25. Hou Y, Song L, Zhu P, Zhang B, Tao Y, et al. 2012. Single-cell exome sequencing and monoclonalevolution of a JAK2-negative myeloproliferative neoplasm. Cell 148:873–85

26. Huang J, Yan L, Fan W, Zhao N, Zhang Y, et al. 2014. Validation of multiple annealing and looping-based amplification cycle sequencing for 24-chromosome aneuploidy screening of cleavage-stage embryos.Fertil. Steril. 102:1685–91

27. Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, et al. 2004. Detection of large-scale variationin the human genome. Nat. Genet. 36:949–51

28. Ingles J, Doolan A, Chiu C, Seidman J, Seidman C, Semsarian C. 2005. Compound and double mutationsin patients with hypertrophic cardiomyopathy: implications for genetic testing and counselling. J. Med.Genet. 42:e59

29. Int. HapMap Consort. 2005. A haplotype map of the human genome. Nature 437:1299–32030. Jeffreys AJ, Neumann R. 2002. Reciprocal crossover asymmetry and meiotic drive in a human recombi-

nation hot spot. Nat. Genet. 31:267–7131. Kalisky T, Quake SR. 2011. Single-cell genomics. Nat. Methods 8:311–1432. Kirkness EF, Grindberg RV, Yee-Greenbaum J, Marshall CR, Scherer SW, et al. 2013. Sequencing of

isolated sperm cells for direct haplotyping of a human genome. Genome Res. 23:826–32

100 Huang et al.

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 201

5.16

:79-

102.

Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

g A

cces

s pr

ovid

ed b

y H

arva

rd U

nive

rsity

on

09/0

1/15

. For

per

sona

l use

onl

y.

Page 23: Single-Cell Whole-Genome Amplification and Sequencing ...harvard.sunneyxielab.org/papers/2015_Huang_Annu... · change. The single-cell methylome has been determined (19), but this

GG16CH04-Xie ARI 21 July 2015 11:16

33. Kitzman JO, MacKenzie AP, Adey A, Hiatt JB, Patwardhan RP, et al. 2011. Haplotype-resolved genomesequencing of a Gujarati Indian individual. Nat. Biotechnol. 29:59–63

34. Kong A, Thorleifsson G, Gudbjartsson DF, Masson G, Sigurdsson A, et al. 2010. Fine-scale recombinationrate differences between sexes, populations and individuals. Nature 467:1099–103

35. Lasken RS. 2012. Genomic sequencing of uncultured microorganisms from single cells. Nat. Rev. Microbiol.10:631–40

36. Lasken RS. 2013. Single-cell sequencing in its prime. Nat. Biotechnol. 31:211–1237. Lasken RS, Egholm M. 2003. Whole genome amplification: abundant supplies of DNA from precious

samples or clinical specimens. Trends Biotechnol. 21:531–3538. Levy S, Sutton G, Ng PC, Feuk L, Halpern AL, et al. 2007. The diploid genome sequence of an individual

human. PLOS Biol. 5:e25439. Lohr JG, Adalsteinsson VA, Cibulskis K, Choudhury AD, Rosenberg M, et al. 2014. Whole-exome

sequencing of circulating tumor cells provides a window into metastatic prostate cancer. Nat. Biotechnol.32:479–84

40. Lu S, Zong C, Fan W, Yang M, Li J, et al. 2012. Probing meiotic recombination and aneuploidy of singlesperm cells by whole-genome sequencing. Science 338:1627–30

41. Ma L, Xiao Y, Huang H, Wang Q, Rao W, et al. 2010. Direct determination of molecular haplotypes bychromosome microdissection. Nat. Methods 7:299

42. MacArthur D. 2012. Methods: face up to false positives. Nature 487:427–2843. Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, et al. 2005. Genome sequencing in microfab-

ricated high-density picolitre reactors. Nature 437:376–8044. Morley M, Molony CM, Weber TM, Devlin JL, Ewens KG, et al. 2004. Genetic analysis of genome-wide

variation in human gene expression. Nature 430:743–4745. Mueller E, Brueck C. 2015. Whole genome amplification for single cell biology. Tech. Doc., Sigma-Aldrich, St.

Louis, MO. http://www.sigmaaldrich.com/technical-documents/articles/life-science-innovations/whole-genome-amplification.html

46. Nat. Methods Eds. 2014. Method of the Year 2013. Nat. Methods 11:147. Navin NE. 2014. Cancer genomics: one cell at a time. Genome Biol. 15:45248. Navin NE, Kendall J, Troge J, Andrews P, Rodgers L, et al. 2011. Tumour evolution inferred by single-cell

sequencing. Nature 472:90–9449. Ni X, Zhuo M, Su Z, Duan J, Gao Y, et al. 2013. Reproducible copy number variation patterns among

single circulating tumor cells of lung cancer patients. PNAS 110:21083–8850. Peters BA, Kermani BG, Sparks AB, Alferov O, Hong P, et al. 2012. Accurate whole-genome sequencing

and haplotyping from 10 to 20 human cells. Nature 487:190–9551. Pugh TJ, Delaney AD, Farnoud N, Flibotte S, Griffith M, et al. 2008. Impact of whole genome amplifi-

cation on analysis of copy number variants. Nucleic Acids Res. 36:e8052. Rubio C, Rodrigo L, Mir P, Mateu E, Peinado V, et al. 2013. Use of array comparative genomic hy-

bridization (array-CGH) for embryo assessment: clinical results. Fertil. Steril. 99:1044–4853. Ruiz C, Li J, Luttgen MS, Kolatkar A, Kendall JT, et al. 2015. Limited genomic heterogeneity of circu-

lating melanoma cells in advanced stage patients. Phys. Biol. 12:01600854. Saiki RK, Gelfand DH, Stoffel S, Scharf SJ, Higuchi R, et al. 1988. Primer-directed enzymatic amplifi-

cation of DNA with a thermostable DNA polymerase. Science 239:487–9155. Sandberg R. 2014. Entering the era of single-cell transcriptomics in biology and medicine. Nat. Methods

11:22–2456. Sebat J, Lakshmi B, Troge J, Alexander J, Young J, et al. 2004. Large-scale copy number polymorphism

in the human genome. Science 305:525–2857. Shapiro E, Biezuner T, Linnarsson S. 2013. Single-cell sequencing-based technologies will revolutionize

whole-organism science. Nat. Rev. Genet. 14:619–3058. Shen J, Cram DS, Wu W, Cai L, Yang X, et al. 2013. Successful PGD for late infantile neuronal ceroid

lipofuscinosis achieved by combined chromosome and TPP1 gene analysis. Reprod. BioMed. Online 27:176–83

59. Suk EK, McEwen GK, Duitama J, Nowick K, Schulz S, et al. 2011. A comprehensively molecularhaplotype-resolved genome of a European individual. Genome Res. 21:1672–85

www.annualreviews.org • Single-Cell Whole-Genome Amplification and Sequencing 101

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 201

5.16

:79-

102.

Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

g A

cces

s pr

ovid

ed b

y H

arva

rd U

nive

rsity

on

09/0

1/15

. For

per

sona

l use

onl

y.

Page 24: Single-Cell Whole-Genome Amplification and Sequencing ...harvard.sunneyxielab.org/papers/2015_Huang_Annu... · change. The single-cell methylome has been determined (19), but this

GG16CH04-Xie ARI 21 July 2015 11:16

60. Tang F, Lao K, Surani MA. 2011. Development and applications of single-cell transcriptome analysis.Nat. Methods 8(Suppl.):S6–11

61. Telenius H, Carter NP, Bebb CE, Nordenskjo M, Ponder BA, Tunnacliffe A. 1992. Degenerateoligonucleotide-primed PCR: general amplification of target DNA by a single degenerate primer.Genomics 13:718–25

62. Tewhey R, Bansal V, Torkamani A, Topol EJ, Schork NJ. 2011. The importance of phase informationfor human genomics. Nat. Rev. Genet. 12:215–23

63. Tobler KJ, Brezina PR, Benner AT, Du L, Xu X, Kearns WG. 2014. Two different microarray tech-nologies for preimplantation genetic diagnosis and screening, due to reciprocal translocation imbalances,demonstrate equivalent euploidy and clinical pregnancy rates. J. Assist. Reprod. Genet. 31:843–50

64. Treff NR, Fedick A, Tao X, Devkota B, Taylor D, Scott RT. 2013. Evaluation of targeted next-generationsequencing–based preimplantation genetic diagnosis of monogenic disease. Fertil. Steril. 99:1377–84

65. Treff NR, Tao X, Ferry KM, Su J, Taylor D, Scott RT. 2012. Development and validation of an accu-rate quantitative real-time polymerase chain reaction–based assay for human blastocyst comprehensivechromosomal aneuploidy screening. Fertil. Steril. 97:819–24

66. Van Driest SL, Vasile VC, Ommen SR, Will ML, Tajik AJ, et al. 2004. Myosin binding proteinC mutations and compound heterozygosity in hypertrophic cardiomyopathy. J. Am. Coll. Cardiol. 44:1903–10

67. Vogelstein B, Kinzler KW. 2004. Cancer genes and the pathways they control. Nat. Med. 10:789–9968. Wang J, Fan HC, Behr B, Quake SR. 2012. Genome-wide single-cell analysis of recombination activity

and de novo mutation rates in human sperm. Cell 150:402–1269. Wang J, Wang W, Li R, Li Y, Tian G, et al. 2008. The diploid genome sequence of an Asian individual.

Nature 456:60–6570. Wang Y, Waters J, Leung ML, Unruh A, Roh W, et al. 2014. Clonal evolution in breast cancer revealed

by single nucleus genome sequencing. Nature 512:155–6071. Wells D, Kaur K, Grifo J, Glassner M, Taylor JC, et al. 2014. Clinical utilisation of a rapid low-pass whole

genome sequencing technique for the diagnosis of aneuploidy in human embryos prior to implantation.J. Med. Genet. 51:553–62

72. Wheeler DA, Srinivasan M, Egholm M, Shen Y, Chen L, et al. 2008. The complete genome of anindividual by massively parallel DNA sequencing. Nature 452:872–76

73. Xu X, Hou Y, Yin X, Bao L, Tang A, et al. 2012. Single-cell exome sequencing reveals single-nucleotidemutation characteristics of a kidney tumor. Cell 148:886–95

74. Yang H, Chen X, Wong WH. 2011. Completely phased genome sequencing through chromosome sorting.PNAS 108:12–17

75. Yu C, Yu J, Yao X, Wu WK, Lu Y, et al. 2014. Discovery of biclonal origin and a novel oncogene SLC12A5in colon cancer by single-cell sequencing. Cell Res. 24:701–12

76. Zhang K, Zhu J, Shendure J, Porreca GJ, Aach JD, et al. 2006. Long-range polony haplotyping of individualhuman chromosome molecules. Nat. Genet. 38:382–87

77. Zhang L, Cui X, Schmitt K, Hubert R, Navidi W, Arnheim N. 1992. Whole genome amplification froma single cell: implications for genetic analysis. PNAS 89:5847–51

78. Zong C, Lu S, Chapman AR, Xie XS. 2012. Genome-wide detection of single-nucleotide and copy-numbervariations of a single human cell. Science 338:1622–26

102 Huang et al.

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 201

5.16

:79-

102.

Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

g A

cces

s pr

ovid

ed b

y H

arva

rd U

nive

rsity

on

09/0

1/15

. For

per

sona

l use

onl

y.

Page 25: Single-Cell Whole-Genome Amplification and Sequencing ...harvard.sunneyxielab.org/papers/2015_Huang_Annu... · change. The single-cell methylome has been determined (19), but this

GG16-FrontMatter ARI 22 June 2015 17:20

Annual Review ofGenomics andHuman Genetics

Volume 16, 2015Contents

A Mathematician’s OdysseyWalter Bodmer � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 1

Lessons from modENCODEJames B. Brown and Susan E. Celniker � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �31

Non-CG Methylation in the Human GenomeYupeng He and Joseph R. Ecker � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �55

Single-Cell Whole-Genome Amplification and Sequencing:Methodology and ApplicationsLei Huang, Fei Ma, Alec Chapman, Sijia Lu, and Xiaoliang Sunney Xie � � � � � � � � � � � � � � � �79

Unraveling the Tangled Skein: The Evolution of TranscriptionalRegulatory Networks in DevelopmentMark Rebeiz, Nipam H. Patel, and Veronica F. Hinman � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 103

Alignment of Next-Generation Sequencing ReadsKnut Reinert, Ben Langmead, David Weese, and Dirk J. Evers � � � � � � � � � � � � � � � � � � � � � � � � � 133

The Theory and Practice of Genome Sequence AssemblyJared T. Simpson and Mihai Pop � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 153

Addressing the Genetics of Human Mental Health Disordersin Model OrganismsJasmine M. McCammon and Hazel Sive � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 173

Advances in Skeletal Dysplasia GeneticsKrista A. Geister and Sally A. Camper � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 199

The Genetics of Soft Connective Tissue DisordersOlivier Vanakker, Bert Callewaert, Fransiska Malfait, and Paul Coucke � � � � � � � � � � � � � � � 229

Neurodegeneration with Brain Iron Accumulation: Genetic Diversityand Pathophysiological MechanismsEsther Meyer, Manju A. Kurian, and Susan J. Hayflick � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 257

The Pathogenesis and Therapy of Muscular DystrophiesSimon Guiraud, Annemieke Aartsma-Rus, Natassia M. Vieira, Kay E. Davies,

Gert-Jan B. van Ommen, and Louis M. Kunkel � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 281

v

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 201

5.16

:79-

102.

Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

g A

cces

s pr

ovid

ed b

y H

arva

rd U

nive

rsity

on

09/0

1/15

. For

per

sona

l use

onl

y.

Page 26: Single-Cell Whole-Genome Amplification and Sequencing ...harvard.sunneyxielab.org/papers/2015_Huang_Annu... · change. The single-cell methylome has been determined (19), but this

GG16-FrontMatter ARI 22 June 2015 17:20

Detection of Chromosomal Aberrations in Clinical Practice: FromKaryotype to Genome SequenceChrista Lese Martin and Dorothy Warburton � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 309

Mendelian Randomization: New Applications in the Coming Ageof Hypothesis-Free CausalityDavid M. Evans and George Davey Smith � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 327

Eugenics and Involuntary Sterilization: 1907–2015Philip R. Reilly � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 351

Noninvasive Prenatal Genetic Testing: Current and Emerging Ethical,Legal, and Social IssuesMollie A. Minear, Stephanie Alessi, Megan Allyse, Marsha Michie,

and Subhashini Chandrasekharan � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 369

Errata

An online log of corrections to Annual Review of Genomics and Human Genetics articlesmay be found at http://www.annualreviews.org/errata/genom

vi Contents

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 201

5.16

:79-

102.

Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

g A

cces

s pr

ovid

ed b

y H

arva

rd U

nive

rsity

on

09/0

1/15

. For

per

sona

l use

onl

y.