cellular identity at the single-cell level · molecular identifier (umi) based molecule-counting...

15
This journal is © The Royal Society of Chemistry 2016 Mol. BioSyst., 2016, 12, 2965--2979 | 2965 Cite this: Mol. BioSyst., 2016, 12, 2965 Cellular identity at the single-cell level Ahmet F. Coskun,* a Umut Eser b and Saiful Islam c A single cell creates surprising heterogeneity in a multicellular organism. While every organismal cell shares almost an identical genome, molecular interactions in cells alter the use of DNA sequences to modulate the gene of interest for specialization of cellular functions. Each cell gains a unique identity through molecular coding across the DNA, RNA, and protein conversions. On the other hand, loss of cellular identity leads to critical diseases such as cancer. Most cell identity dissection studies are based on bulk molecular assays that mask differences in individual cells. To probe cell-to-cell variability in a population, we discuss single cell approaches to decode the genetic, epigenetic, transcriptional, and translational mechanisms for cell identity formation. In combination with molecular instructions, the physical principles behind cell identity determination are examined. Deciphering and reprogramming cellular types impact biology and medicine. Introduction Multicellular organismal life starts from a single cell and experiences significant physiological and molecular changes during development under dynamic environmental stimuli. A human body is formed based on a single zygote cell that creates more than 10 13 specialized cells with unique subcellular structures and complex functions after extensive cellular divi- sion. 1 Although individual cells in an organism share almost indistinguishable genomic material, cellular proliferation and differentiation lead to remarkable diversity that makes up different parts of organisms such as blood formation, neural development, and organ growth (Fig. 1). Regulatory mecha- nisms are considered to facilitate cellular specialization in development across the central dogma covering deoxyribonucleic acid (DNA), ribonucleic acid (RNA), and protein conversions. 2 Genomic DNA content, messenger RNA expression, and protein a Division of Chemistry and Chemical Engineering, California Institute of Technology, California, USA. E-mail: [email protected] b Department of Genetics, Harvard Medical School, Massachusetts, USA. E-mail: [email protected] c Department of Genetics, Stanford Genome Technology Center, Stanford University, Stanford, USA. E-mail: [email protected] Ahmet F. Coskun Ahmet F. Coskun is a Research Fellow at the Division of Chemistry and Chemical Engin- eering at the California Institute of Technology. He is a recipient of the Burroughs Wellcome Fund Career Award at the Scientific Interface. He researches systems biology, precision medicine, and biophotonics. His current research centers on single cell analysis in biology and medicine using interdisciplinary quanti- tative tools. He holds a PhD degree from the University of California, Los Angeles, and a bachelor degree from Koc University, Turkey. Umut Eser Umut Eser was awarded a gold medal in the 33rd International Physics Olympiad. He earned his BS degree in Physics from the Middle East Technical University in 2006 and received a PhD in Applied Physics at Stanford University in 2013. His docto- rate thesis focuses on under- standing the design principles of cellular decision making, parti- cularly cell cycle commitment. Currently, he continues his postdoctoral studies in the Department of Genetics at the Harvard Medical School. His research interests include evolution of transcription regulation, deep learning applications on genomics and epigenomics, and statistical representation of complex data. Received 18th May 2016, Accepted 18th July 2016 DOI: 10.1039/c6mb00388e www.rsc.org/molecularbiosystems Molecular BioSystems REVIEW Published on 19 July 2016. Downloaded by California Institute of Technology on 11/11/2016 15:10:45. View Article Online View Journal | View Issue

Upload: others

Post on 22-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cellular identity at the single-cell level · Molecular Identifier (UMI) based molecule-counting method at the single-cell level. He is currently working as a postdoctoral research

This journal is©The Royal Society of Chemistry 2016 Mol. BioSyst., 2016, 12, 2965--2979 | 2965

Cite this:Mol. BioSyst., 2016,

12, 2965

Cellular identity at the single-cell level

Ahmet F. Coskun,*a Umut Eserb and Saiful Islamc

A single cell creates surprising heterogeneity in a multicellular organism. While every organismal cell

shares almost an identical genome, molecular interactions in cells alter the use of DNA sequences to

modulate the gene of interest for specialization of cellular functions. Each cell gains a unique identity

through molecular coding across the DNA, RNA, and protein conversions. On the other hand, loss of

cellular identity leads to critical diseases such as cancer. Most cell identity dissection studies are based

on bulk molecular assays that mask differences in individual cells. To probe cell-to-cell variability in a

population, we discuss single cell approaches to decode the genetic, epigenetic, transcriptional, and

translational mechanisms for cell identity formation. In combination with molecular instructions, the

physical principles behind cell identity determination are examined. Deciphering and reprogramming

cellular types impact biology and medicine.

Introduction

Multicellular organismal life starts from a single cell andexperiences significant physiological and molecular changesduring development under dynamic environmental stimuli.

A human body is formed based on a single zygote cell thatcreates more than 1013 specialized cells with unique subcellularstructures and complex functions after extensive cellular divi-sion.1 Although individual cells in an organism share almostindistinguishable genomic material, cellular proliferation anddifferentiation lead to remarkable diversity that makes updifferent parts of organisms such as blood formation, neuraldevelopment, and organ growth (Fig. 1). Regulatory mecha-nisms are considered to facilitate cellular specialization indevelopment across the central dogma covering deoxyribonucleicacid (DNA), ribonucleic acid (RNA), and protein conversions.2

Genomic DNA content, messenger RNA expression, and protein

a Division of Chemistry and Chemical Engineering, California Institute of

Technology, California, USA. E-mail: [email protected] Department of Genetics, Harvard Medical School, Massachusetts, USA.

E-mail: [email protected] Department of Genetics, Stanford Genome Technology Center, Stanford University,

Stanford, USA. E-mail: [email protected]

Ahmet F. Coskun

Ahmet F. Coskun is a ResearchFellow at the Division ofChemistry and Chemical Engin-eering at the California Instituteof Technology. He is a recipient ofthe Burroughs Wellcome FundCareer Award at the ScientificInterface. He researches systemsbiology, precision medicine, andbiophotonics. His currentresearch centers on single cellanalysis in biology and medicineusing interdisciplinary quanti-tative tools. He holds a PhD

degree from the University of California, Los Angeles, and abachelor degree from Koc University, Turkey.

Umut Eser

Umut Eser was awarded a goldmedal in the 33rd InternationalPhysics Olympiad. He earned hisBS degree in Physics from theMiddle East Technical Universityin 2006 and received a PhD inApplied Physics at StanfordUniversity in 2013. His docto-rate thesis focuses on under-standing the design principles ofcellular decision making, parti-cularly cell cycle commitment.Currently, he continues hispostdoctoral studies in the

Department of Genetics at the Harvard Medical School. Hisresearch interests include evolution of transcription regulation,deep learning applications on genomics and epigenomics, andstatistical representation of complex data.

Received 18th May 2016,Accepted 18th July 2016

DOI: 10.1039/c6mb00388e

www.rsc.org/molecularbiosystems

MolecularBioSystems

REVIEW

Publ

ishe

d on

19

July

201

6. D

ownl

oade

d by

Cal

ifor

nia

Inst

itute

of

Tec

hnol

ogy

on 1

1/11

/201

6 15

:10:

45.

View Article OnlineView Journal | View Issue

Page 2: Cellular identity at the single-cell level · Molecular Identifier (UMI) based molecule-counting method at the single-cell level. He is currently working as a postdoctoral research

2966 | Mol. BioSyst., 2016, 12, 2965--2979 This journal is©The Royal Society of Chemistry 2016

abundance have been weakly correlated within biologicalprocesses in cells.3,4 Besides, the molecular patterns that a cellexpresses are very dynamic and subject to change over time.Thus, genetic information transfer from DNA to proteinexhibits significant heterogeneity and stochasticity, resultingin a variety of cell fates and phenotypes that define specificcell types.

Genome-wide DNA, RNA, and protein profiling methodshave elucidated potential regulatory mechanisms for cellulardifferences.5,6 However, these approaches have been limited tothe ensemble average data that combine information from agroup of cells. The heterogeneity in cellular populations isresolved by analyzing the regulatory elements in individual cells.Thus, single cell technologies have received considerable atten-tion to explore population architecture in organisms consistingof various cellular subtypes. For example, emerging single-cellRNA sequencing (RNA-seq) and single-cell quantitative poly-merase chain reaction (q-PCR) techniques have mapped out themolecular states of individual cells in both healthy and diseasedmodel systems. While cells have been classified based on geneexpression data in certain organisms, understanding globalregulation that gives rise to cellular identities requires sophisti-cated analysis of epigenome, genome, and proteome landscapestogether in spatial and temporal domains.

Here, we discuss molecular coding mechanisms for cellularidentity formation especially covering genomic, epigenomic,transcriptional, and translational regulation in cellular develop-ment. We provide single cell approaches that have shed light onsome of these codes in development. In particular, we reviewthe observed cellular states using available single cell profilingtechniques. We then highlight the need for further advances insingle cell techniques. In addition to molecular programming,we discuss the physical mechanisms leading to cell type forma-tion. Finally, we overview some of the applications of cell identitystudies in biology and medicine.

Molecular coding

Recent advances in whole genome sequencing have revealedthat humans share 18–95% of a common genome with otherorganisms such as worms, fruit flies, zebrafish, plants, dogs,and mice.7–10 Similarities in their DNA sequences help theseorganisms perform conserved cellular functions such as growthand movement, among others. However, the DNA sequencedifferences are primarily responsible for making each organismunique with a special molecular make up and physical struc-ture. Any two human beings are identical in 99.9% of their DNAcontent.11 Despite this high degree of similarity, humans exhibitsignificant differences between individuals in their physicalappearances and capabilities. Person-to-person variations mayarise from small sections of our DNA that contain only 0.1% of

Fig. 1 (a) Single zygote cell produces cellular diversity in a multicellularorganism in development. (b) One cell creates up to trillions of functionalcells. (c) A series of differentiation events create highly specialized cells.(d) Microarray data show different cell types from gene expression analysisof 80 tissue samples. Colours represent distinct tissue types and each greydot denotes a gene expression data point for each tissue sample.

Saiful Islam

Saiful Islam has contributedsignificantly during his PhD atKarolinska Institutet to thedevelopment of single cell trans-criptomics methods includingSingle-cell Tagged Reverse Trans-cription (STRT) for single celltranscriptomics and a UniqueMolecular Identifier (UMI) basedmolecule-counting method at thesingle-cell level. He is currentlyworking as a postdoctoralresearch scholar in the GeneticsDepartment at Stanford Univer-

sity and applies single cell analysis to further understand thebiology of clonal T-cell expansion in response to pathogens. Inaddition, he is working on developing a new single cell method todeal with very small cells containing cell walls.

Review Molecular BioSystems

Publ

ishe

d on

19

July

201

6. D

ownl

oade

d by

Cal

ifor

nia

Inst

itute

of

Tec

hnol

ogy

on 1

1/11

/201

6 15

:10:

45.

View Article Online

Page 3: Cellular identity at the single-cell level · Molecular Identifier (UMI) based molecule-counting method at the single-cell level. He is currently working as a postdoctoral research

This journal is©The Royal Society of Chemistry 2016 Mol. BioSyst., 2016, 12, 2965--2979 | 2967

the entire 6 billion base pairs corresponding to a significantamount of ‘‘6 million’’ base pairs per cell.

In addition, mutations in DNA contribute to this diversity.In humans, DNA polymerase fidelity is estimated to be about10�8 per genomic site per generation. For a 6.6 � 109 base-pairdiploid genome, on average 60 mutations occur per celldivision.12–14 Everyone naturally contains hundreds of muta-tions and most of them persist during the life span. Such asignificant mutation rate has an impact on human conditionsespecially for diseases. In the human body, the DNA is almostidentical in every cell, but still many specialized cell types havebeen produced within different tissues and organs (Fig. 1).To shed light on cell identity formation, the following sectionwill discuss molecular coding mechanisms (Fig. 2) across thecentral dogma of biology that lead to cellular specialization.

Genetic code

Cellular DNA contains genetic instructions for human traitsand transmits to the next generations through inheritance. Allof the cells inherit the same DNA from the germline cells, andthus, the cells within an organism have almost the same DNAsequences (exceptions are discussed in the next section). Thesecells make use of an instruction manual, known as the geneticcode, to read the nucleotide sequence of a gene and construct acorresponding protein structure. In particular, the code com-prises a direct mapping of every triplet of nucleotides in a nucleicacid sequence to a single amino acid.15 While the majority ofgenes are encoded with exactly the canonical or standard geneticcode, there are many other variant or non-canonical codes.Therefore, the genetic code is not universal and has continuedto expand its coding capacity throughout evolution.16,17 Suchnaturally evolved genetic code in cells may play an importantrole in the proteome balance, which might be critical forsurvival functions. Despite recent evidence in genetic codemodifications, the cell identity formation is tightly linked tothe activation of subsets of genes through a presumed universalgenetic code. Simply, the DNA sequences are conserved,

but selective and specific use of DNA gives rise to different celltypes in development.

Genomic code

Different cells in an organism have minor differences in DNAsequences due to genomic aberrations including somatic muta-tions, single-nucleotide polymorphisms (SNPs), deletions, struc-tural variants, and copy number variations (CNVs), and geneticmosaicism.18 Additional genomic differences are prevalent inimmune cell types such as B-cells that modify a small part ofDNA to produce antibodies and some organisms with pro-grammed DNA elimination.19 While normal development doesnot phenotypically change with these small genomic variations,cancer formation is partially attributed to these mutations.20

Whole genome amplification and sequencing methods havebeen developed to access these genomic variations in diseasessuch as neurological disorders.21

Chromosomal conformation code

The human genome is hierarchically organized in three dimen-sions (3D) leading to a fractal globular shape in the nucleus.22

Different juxtapositions of transcription factories, enhancers,and promoters interact with each other in physical space, givingrise to differential expression patterns.23,24 Such variation inchromosomal conformation is not only found among differentorganisms, but is also observed among different tissues of thesame organism. Distinct cell types exhibit unique 3D chromatinstructures and genome topology. Specifically, topologicallyassociated domains (TADs) control physical interactions ofgenomic elements (enhancers and promoters) to regulate geno-mic activity. TADs determine distinct 3D chromosome struc-tures based on the cell type origin.25 Altering the structure ofTADs and boundaries can lead to diseases.24 Notably, 3D genomestructure modifications play a central role in cell identity deter-mination for other mechanisms such as transcriptional andepigenetic control, the details of which will be expanded in thefollowing sections.

Transcriptional code

In the quest for determining their fates, cells combine cascadedmessages coming from signaling networks and transcriptionalregulators to decide which set of genes to express. Specifically,cell identity is formed through the interplay between transcrip-tional factors and environmental stimulation such as Notchsignaling in immune cell development or Shh signaling incentral nervous system development.26–29 Transcription factorsplay a role as activators or repressors of gene expression indeveloping organs. Cell types express a specific set of transcrip-tion factors as a barcode identifier for a cell. Thus, combinatorialtranscription regulation lies at the heart of developmentalprocesses in organisms. For instance, an appropriate combi-nation of transcription factors (Brn2, Ascl1, Mytl1, and Neurod1)is sufficient to reprogram fibroblasts to other cell types suchas neurons.30

In the DNA–protein association data by Chromatin immuno-precipitation followed by sequencing (ChIP Seq), transcription

Fig. 2 Cells are coded at different levels of the central dogma to createcellular diversity.

Molecular BioSystems Review

Publ

ishe

d on

19

July

201

6. D

ownl

oade

d by

Cal

ifor

nia

Inst

itute

of

Tec

hnol

ogy

on 1

1/11

/201

6 15

:10:

45.

View Article Online

Page 4: Cellular identity at the single-cell level · Molecular Identifier (UMI) based molecule-counting method at the single-cell level. He is currently working as a postdoctoral research

2968 | Mol. BioSyst., 2016, 12, 2965--2979 This journal is©The Royal Society of Chemistry 2016

factors typically bind to cis-acting or trans-acting regulatoryelements known as enhancers to regulate gene expression.31,32

Enhancers can be shared by multiple transcriptional factors andexhibit cell type specific distribution. Therefore, transcriptionalenhancers contribute to the selection of cell fates. Mastertranscriptional factors form clusters of enhancers, which arereferred to as super-enhancers.33,34 Chromosome 3D local struc-tures control super-enhancer driven cell identity formation.Specifically, local chromosome loops contribute to the insulationof neighbouring genes. In addition, cancer cells form superenhancers for oncogenes.35 While the concept of super enhancersis relatively new and requires substantial validation, theiremerging role may be important in cell identity both in healthand disease.36

Epigenetic code

A complementary mechanism to the interplay of local regula-tory DNA sequence and transcription factors, by orchestratingthe gene expression through the changing state of the chroma-tin, is called epigenetics.37 Epigenetic code primarily researchesthe roles of DNA packaging histone proteins and chemicalmodification of the nucleotides such as DNA methylation.38

Compared to universal genetic code, epigenetic codes exhibitcell type specific regulation, providing a critical component ofcell identity studies.

Histone code is an epigenetic factor that partly changestranscription by chemical modifications to histone proteins.39–42

Combinations of covalent histone modifications such as methyl-ation, acetylation, phosphorylation, and ubiquitination recruitother specific proteins to modify the chromatin structure,causing activation or repression of a gene. Enhancers in humangenome are significantly marked by histone modifications indistinct cells to enable cell type specific gene expression.43 Tomaintain the cell identity against variations of the stochasticenvironment, histone modifications such as H3K4me3 consis-tently mark the key cell identity genes in specific tissues.44

Polycomb-group (PcG) protein complexes catalyse histone modi-fications to silence transcription in embryonic development.45–48

Epigenetic PcG repression plays an important role for stem cellsin health and disease such as cancer.

DNA methylation adds to the modularity of epigenetics toaffect cell type specific gene expression.49 Covalent addition ofa methyl group to the cytosine across the genome causesdifferential regulation in the adjacent genes.50 For instance,pancreatic b cell identity is controlled by the repression of anaristaless-related homeobox (Arx) gene through the DNA methyl-ation mechanism.51

Non-coding DNA sequences such as transposable elementshave an impact on cell type specific transcriptional activity. Thesemobile DNA pieces move around the DNA genome. In particular,L1 retrotransposition, a mechanism in which reverse transcribedelements are inserted into the genome, exhibits differentialregulation in germ cells, stem cells, and neuronal cells.52–57

Another epigenetic modulator is non-coding RNAs in cells.Long non-coding RNAs (lncRNAs) play an important role inreprogramming mammalian cells.58–60 Human tissues exhibited

cell specific expression of lncRNAs in b cells. A mesodermspecific Fendrr lncRNA provided tissue specific expression inheart and body wall development. lncRNAs appear to be crucialgene expression regulators by affecting nuclear organizationand sequestering microRNAs (miRNAs), behaving as miRNAsponges.61–65 In addition, microRNAs are short non-codingRNAs that can manipulate the expression of genes to modulatethe differentiation status of cell identity.66 miRNAs cooperate tosilence target mRNAs.67 Another class of small RNAs includesPiwi interacting RNAs (piRNAs) that repress the activity ofretrotransposons especially in somatic and germline stemcells.68,69 piRNAs control lineage determination in multiple celltypes within the Drosophila ovary.70

While the term ‘‘code’’ for epigenetic modifications has beenwidely used in cell biology, it is still at the level of hypothesis andbegs the question for concrete evidence. Current observableshave been limited to correlations. For example, H3K36me3 isassociated with gene expression at the 30 end of the genes.However, there are cases where the gene is not expressed andstill H3K36me3 was pronounced, or alternatively, the gene isactive but H3K36me3 was not detected in cellular machineries.Besides, the majority of these histone marks correlate highlywith each other, raising the issue of whether those are determi-nants; or, just marks of the DNA state; or which ones are thedeterminants.

RNA code

RNA processing of the nucleotide sequence affects cell identity.Both co-transcriptional and post-transcriptional regulation mecha-nisms of gene expression are controlled by alternative splicingevents. Splicing of a transcript exhibits cell type specific regula-tion differences to perform specialized functions in differen-tiated cells.71–74 Moreover, an alternatively spliced transcriptcan produce diverse proteins, which interact with a different setof proteins as if they originated from different genes.75 Cellsacquire splicing code based on the developmental context(sex, age, function, and organism) to obtain the necessary diverseprotein landscape for cellular functions.

RNA editing is another post-transcriptional modification ofRNA molecules to change the nucleotides on the transcripts.It is a rare alteration that has been observed in neuronal cellidentity development.76 In particular, adenosine-to-inosineediting in RNAs creates protein isoforms for creating neuralexcitability differences.77 14 human cell lines exhibited differencesin their RNA editing patterns.78 Tissue specific RNA editingwas mapped out by a computational tool (GIREMI).79 TheAPOBEC3A gene facilitates RNA editing events in macrophagesand monocytes.80

Translational code

The ribosomal machinery has classically been considered toact constitutively in the translation of mRNAs to proteins.However, there is increasing evidence to support a specializedribosome theory that exhibits transcript specific regulation indevelopment.81 Thus, a ribosome-mediated regulation mechanism82

leads to cell identity formation. The ribosomal code suggests

Review Molecular BioSystems

Publ

ishe

d on

19

July

201

6. D

ownl

oade

d by

Cal

ifor

nia

Inst

itute

of

Tec

hnol

ogy

on 1

1/11

/201

6 15

:10:

45.

View Article Online

Page 5: Cellular identity at the single-cell level · Molecular Identifier (UMI) based molecule-counting method at the single-cell level. He is currently working as a postdoctoral research

This journal is©The Royal Society of Chemistry 2016 Mol. BioSyst., 2016, 12, 2965--2979 | 2969

that unique ribosomes with structural and functional differ-ences may prefer translation of subsets of mRNAs in differentcell types. Ribosomal compositional variations (composed of4 ribosomal RNAs and 80 proteins) have recently been measuredby mass spectroscopy to reveal differential stoichiometry amongribosomal proteins.83

Protein code

While transcriptional and translational mechanisms encodeproteins, the dynamic regulation of proteins to perform specificcellular functions is guided by post-translational modificationssuch as phosphorylation, among more than 200 others. Forinstance, cells utilize such modifications of proteins to main-tain the body’s circadian clock and basic cell division tasks, aswell as to avoid cancer formation.84–86 Thus, protein modifica-tions are critical for regulatory specificity in differentiated andspecialized cells.

Proteins must be folded properly and pass the quality controlin a cell.87 Aggregation of misfolded proteins causes a loss of cellidentity, increasing the prevalence of critical diseases such ascancer. Next, protein–protein interactions experience cell typespecificity to obtain distinct cellular identities. For instance, LIMfactor Lhx3 binds to LIM cofactor NLI to stimulate interneuronspecification in development.88 Protein complexes indirectlyrepress the activity of a transcriptional factor to guide theselection of a neuronal fate.

Single cell decoding

Powerful genome wide DNA/RNA sequencing technologieshave successfully mapped out molecular codes for cell identityformation.5,6 However, the presented approaches can onlyprovide ensemble data. Sub-population biological studies havebeen lacking due to these sequencing limitations. The non-uniform distribution of molecular components across the popula-tion might be critical to understand how cells reach a fate in amulticellular system (Fig. 3). Besides, gene expression levels exhibitsignificant variability in cells due to multiple regulatory mecha-nisms as previously discussed in the molecular coding section.89

Here, we discuss the identification of cellular identity based onsingle cell molecular profiling techniques from the genome to theproteome scale.

Single cell genomics

Nucleotide variants in the genome lead to different cellularfunctions. Mainly, somatic mutations cause nucleotide variantsthat accumulate in the genome of a single-cell over severaldivisions. Even though somatic variations are routinely prevalentin diseased and normal tissues, the rate of somatic variations isunexpectedly high.90,91 These mutations give each cell a uniqueidentity with poorly defined functions. Thus, to decipher the roleof dynamic mutations in cellular functions, single cell wholegenome sequencing technologies reveal previously obscuredvariability and complexity in the genetic factors leading to thosemutations. Multiple types of variations occur in gene functiondue to the burst size and frequency in gene expression, as well ascell cycle differences. To access the genome differences of eachcell, DNA sequencing of single cell materials requires wholegenome amplification methods such as the PCR based method,the multiple annealing and looping based amplification(MALBAC) isothermal method, and multiple displacementamplification (MDA). Microarrays and next generation sequenc-ing then process the resultant amplified material to map outgenomic variations in individual cells. Sequencing single spermcells revealed higher mutation rates compared to the previouspopulation measurements.92 This suggested genomic vulner-ability within the first cellular divisions. Sequencing in singlecancer cells revealed unique phases of tumour evolution basedon copy number variation dynamics.93 Bioinformatics analysisof CNVs within individual cells mapped out the phylogenetictrees of developing cells within organisms.94 Somatic CNVswere detected in human brain health and disease.95 Accumula-tion of somatic mutations in developing human neurons wasalso tracked using single cell DNA sequencing.96

Single cell transcriptomics

Sequencing and imaging technologies have facilitated thequantification of mRNA molecules within a cellular volume.Unlike the fixed nature of DNA, RNA levels fluctuate in cells inresponse to stimuli. Thus, transcript measurements providefunctionality of a single cell. Reverse transcription and whole-transcriptome amplification schemes enabled microarrays,quantitative PCR, and next generation sequencing (NGS) ana-lysis for sequencing approaches at the single cell sensitivity.Transcripts from picograms (pg) of cellular RNA material aretypically converted to complementary DNA (cDNA) duringamplification to enable RNA sequencing read out.97,98 Recently,the reverse transcription step has been modified by an ampli-fication step of a cDNA library together with adapter processing(tagmentation).99 Despite this common cDNA usage feature,current single cell transcriptomics approaches differ based ontheir cell capturing strategies, targeting sites of a transcript(end vs. full length), quantitative counting capabilities (absoluteor not), and strand specificity.100 RNA sequencing methodshave shed light on transcriptional codes within individual cells.

Fig. 3 Distinct cell types have unique physical and molecular signatures.Single cell molecular profiling provides the identity of each individualcell. Despite common DNA sequences, protein (colour filled circles) andgenome (grey) interactions create a heterogenous distribution of RNAsand proteins, creating different cell identities: Cell ID 1 and 2.

Molecular BioSystems Review

Publ

ishe

d on

19

July

201

6. D

ownl

oade

d by

Cal

ifor

nia

Inst

itute

of

Tec

hnol

ogy

on 1

1/11

/201

6 15

:10:

45.

View Article Online

Page 6: Cellular identity at the single-cell level · Molecular Identifier (UMI) based molecule-counting method at the single-cell level. He is currently working as a postdoctoral research

2970 | Mol. BioSyst., 2016, 12, 2965--2979 This journal is©The Royal Society of Chemistry 2016

These high-resolution transcriptional profiles of single cellsreveal key population parameters such as the distinct cell typesand subtypes, additional biomarkers per cell identity, and generegulation modes observed in a new subpopulation (Fig. 4).

Cell type classification. While traditionally cells have decentlybeen classified based on structure and function, the optimal celltype determination approach is to utilize underlying gene regu-lators and their connections.101,102 Therefore, single cell mole-cular profiles are paramount for the identification of distinct celltypes within heterogenous cell populations. Similarity in thetranscriptional states of individual cells is utilized to extractcommon molecular signatures for a cell type. Generally, cellsare classified by computational analyses of single cell RNAsequencing results using data presentation and reduction algo-rithms. Specifically, distinct clusters are obtained on hierarchicalclustering heat map and principle component analysis plots,among many others. Currently, many cell-typing studies havebeen performed in one organ at a time.103–113 Using 96 to 44 000and more cells, RNA sequencing methods have identified 3–47and more cell types in different organs (Table 1). Covering all theorgans would extend mapping of all the cell types within anorganism to pave the way for a human atlas at cellular resolution.Besides, single-cell transcriptome studies provide differences inhealthy and diseased cell states within individuals.

Biomarker discovery. Highly sensitive and specific bio-markers are needed to define the cellular states in health anddisease.114,115 Together with conventional protein and nucleicacid labelling methods, DNA/RNA sequencing informationabout individual cells expands the toolset of testable molecularbiomarkers. Single cell transcriptional profiles mapped outpreviously undetected biomarkers in the colon, the lungs,

the intestine, and the tonsil organs.103,104,106,110 In particular,the identification of biomarkers for immune cells significantlybenefits from single-cell analysis. To identify markers, highexpression levels of specific genes yielded distinct clusters ingene expression maps. For instance, transcriptional profiles of4 distinct cell populations in the innate lymphoid cells (ILCs)revealed new markers in each cellular group such as GATA3 inILC2 and RORC in ILC3 cells.110 These molecular identifiers areused for studies of homeostatic and inflammation conditions.Single cell transcriptional profiles allowed the identification ofnovel markers that were previously masked in population levelmeasurements. Besides, single cell analysis of immune cellsprovides a T cell receptor repertoire that is developed againsta specific infection.116 To map out the antigen diversity, thesingle cell transcriptomics approach yields clonally expandedT cell receptors for therapeutic applications and vaccinationdevelopment.

Rewiring gene regulatory networks. Combinations of transcrip-tion factors interact with a genomic portion of DNA sequenceclusters, and, at the same time, each transcription factor interactswith multiple genomic regions to regulate developmental pro-cesses, creating a gene regulatory network (GRN).117 Althoughgenome-wide DNA/RNA sequencing provided significant GRNdetails, single cell transcriptional results have provided previouslyunrecognized regulatory relationships in the development ofdifferent cell types including blood cells.118–120 Single cell analysishas significant potential to dissect subpopulation architectureexpressing combinations of genes that were obscured in theensemble sequencing data. GRNs are much more complex whenthe cells differentiate into other cell types. Single cell analysis of120 cells at 8 distinct time points during differentiation fromhuman myeloid monocytic leukemia cells to macrophagesrevealed dynamic and specific rewiring.121

Noncoding transcript dynamics. Single cell transcriptionalprofiling by RNA sequencing revealed the dynamic role oflncRNAs in cellular programming. Specifically, lncRNA moleculesregulate metabolic gene expression and repress lineage-specificgenes.122 Single cell RNA fluorescent in situ hybridization (FISH)measurements on 61 distinct lncRNAs within three different celltypes yielded abundance and subcellular localization patterns.123

Fig. 4 Single cell profiling yields insights into cellular populations. Theseadvantages include cell type discovery (T1 and T2 distinct cells with uniquemolecular content), subpopulation architecture identification (S denotes anemerging cellular subgroup), biomarker discovery (A–C and C–E are newidentifiers of cells), and gene regulatory network refinement (red arrowshows previously undetected regulation partners).

Table 1 Transcriptional profiles of single cells are used to define cellidentities and states in distinct organs. RNA sequencing identified similarcells based on computational analyses of transcriptional data. Previouslyunknown cell types, biomarkers, and developmental cues were observedfrom the single cell data

Specimen Method Throughput Result

Colon103 PCR 96 5 cell types, 2 markersLungs104 RNA-seq 198 Cell marker identificationBrain105 RNA-seq 3000 47 subclassesIntestine106 RNA-seq 238 3 subtypes, 1 markerSpleen107 RNA-seq 4000 Immune cell typesBone marrow108 RNA-seq 2730 Early commitmentGlioblastoma109 RNA-seq 430 Multiple subtypesTonsil110 RNA-seq 648 Biomarker discoveryNeuron111 RNA-seq 622 11 cell typesRetina112 Drop-seq 44 000 39 cell typesStem cells113 inDrop-seq 10 000 Gene network

Review Molecular BioSystems

Publ

ishe

d on

19

July

201

6. D

ownl

oade

d by

Cal

ifor

nia

Inst

itute

of

Tec

hnol

ogy

on 1

1/11

/201

6 15

:10:

45.

View Article Online

Page 7: Cellular identity at the single-cell level · Molecular Identifier (UMI) based molecule-counting method at the single-cell level. He is currently working as a postdoctoral research

This journal is©The Royal Society of Chemistry 2016 Mol. BioSyst., 2016, 12, 2965--2979 | 2971

Most of the lncRNAs are localized to the nucleus with signifi-cant cell-to-cell variability, while some are spread across thecytoplasm similar to mRNA localization patterns. Another setof RNA sequencing measurements within individual oocytesshowed that the lack of miRNA and small interfering RNA(siRNA) controlling genes (Dicer1 and Ago2) abnormallyupregulated another 1696 and 1553 genes, respectively.124

Recently, single-cell analysis in the neocortex of the humanbrain revealed that lncRNAs are richly expressed in certain celltypes in brain samples, which were previously masked by bulkstudies.125 These findings showed lncRNA’s unique functionsin specific cell types within the brain, creating distinct braincell identities.

Transcript variant detection. RNA sequencing and FISHvalidation experiments on bone-marrow-derived individualdendritic cells provided heterogenous protein isoforms.126 Duringthe embryonic developmental stages, spanning the zygote to lateblastocyst stages, unique splicing patterns were obtained percell.127 Two or more transcript isoforms of the same gene wereobserved in single cells, creating dynamic patterns of alternativesplicing in human embryos and embryonic stem cells. Anotherstatistical analysis method on RNA sequencing results showed thelink between the cell cycle and alternative splicing.128

Probing ribosome code. Dense transcripts for ribosomalproteins (typically found in the ribosomal subunits) were mappedout in cultures and thymus tissue sections, revealing cell typespecific gene expression in single cells.129 This observationsupports the specialized ribosome theory as a cell identityregulator through the control of translational machinery. Twoopen questions would illuminate the ribosome code hypothesisat the single cell level. First, simultaneous profiling of ribosomalprotein genes and metabolism related genes to test the role of theribosome structure in transcript-specific translation within singlecells. Next, mapping out directly ribosomal proteins (instead ofthe corresponding RNA molecules) to address the potentialquestions due to the short lifetime of RNAs. The RNA versusprotein measurement concern holds for the single cell biologyfield in general, which should be taken into account also for cellidentity studies.

Spatial mapping. While conventional RNA-seq achievesdecent spatial resolution, imaging based in situ RNA profilingtechniques provide spatial organization of cellular identitiesat the single-cell and single-molecule resolution. Spatiallyresolved RNA-seq (Tomo-seq) identified the role of BMP signal-ling in cardiomycote regeneration with coarse 3D mapping of azebrafish heart, which was limited to large sample to samplecollection distances.130 On the other hand, in situ RNA sequenc-ing approaches yield spatial distributions of molecules withina single cell or across multiple cells.131–134 Most in situRNA detection methods are limited to flat cellular layers;however, signal amplification methods have been developedto screen transcriptional states of cellular types in thick tissuesamples.135–137 The spatial arrangement of cells providesopportunities to study interactions of different cell types forthe deconstruction of complex tissue and organ formation indevelopment.

Single cell epigenomics

While most cellular identity results are from genomics andtranscriptomics data, recent efforts in single cell epigenomicshave started to shed light on cell type specific regulations.Single cell bisulfate sequencing has been developed for map-ping DNA methylation across the genome within individualcells. Recent demonstrations revealed epigenetic diversityand dynamics in embryonic stem cells.138,139 Complementaryimaging approaches screen a few epigenomic marks at a time.For instance, histone modifications have been detected by aproximity ligation assay in single cells. In particular, smoothmuscle cells exhibited lineage specific dimethylation of lysine 4of histone H3 (H3K4me2) in the Myh11 locus in individual cellswithin tissue sections.140 Live imaging of reporter assays pro-vided in vivo regulation of DNA methylation patterns in indi-vidual cells.141 Another reporter assay showed the effect ofepigenetic modifications on transcriptional activity.142 Anotherform of DNA methylation variant 5hmC (5-hydroxymethylcytosine)was recently sequenced at the single cell level. Based on theglucosylation-dependent digestion of DNA, 5hmC cell-to-cell varia-tions were obtained and used for lineage reconstruction in mouseembryos.143

Integration of advanced imaging, microfluidics, and barcodingenables a single cell ChIP-Seq method to map out chromatin statesin individual cells. Despite its low coverage, different embryonicstem cells exhibited heterogeneous chromatin profiles.144 Anotherlow input ChIP-Seq was devised to measure histone mark profilesin individual cells.145

Simultaneous measurements of the methylome and tran-scriptome of the same individual cells show the link betweenDNA methylation and transcription. Previously unrecognizedheterogeneity of methylation in distal regulatory elementscontributed to the transcriptional activity of pluripotency genesin embryonic stem cells (ESCs).146 After obtaining substantialresults compared to standalone bisulfite sequencing, thisapproach validated the correlation of the methylation patterns ofnon-CpG island (CGI) promoters with transcriptional repression.These results suggested significant epigenetic heterogeneity fromcell to cell, especially in pluripotency factors such as Esrrb withinserum ESCs.

Single cell proteomics

The protein content of an individual cell indicates its ultimatefunctional state. Powerful microscopy, flow cytometry,147 andmass spectroscopy,148 microarrays,149 and immuno-PCR150

techniques benefited from antibody labelled single cell assaysto profile proteins. Antibody–oligonucleotide conjugate arrayswithin microfluidic chips enabled the highly multiplexed detectionof proteins in single cells.151,152 Next generation sequencing enabledthe detection of ligation assay based protein detection.153,154

Antibody tags exhibit specificity issues and are challenging toengineer for all proteins. To improve specific detection, thesingle cell western blot technique was developed to multiplexproteins within mammalian cells.155 Rapid progress in proteomicprofiling directs toward single cell proteomics.156

Molecular BioSystems Review

Publ

ishe

d on

19

July

201

6. D

ownl

oade

d by

Cal

ifor

nia

Inst

itute

of

Tec

hnol

ogy

on 1

1/11

/201

6 15

:10:

45.

View Article Online

Page 8: Cellular identity at the single-cell level · Molecular Identifier (UMI) based molecule-counting method at the single-cell level. He is currently working as a postdoctoral research

2972 | Mol. BioSyst., 2016, 12, 2965--2979 This journal is©The Royal Society of Chemistry 2016

Single cell chromosome-conformation-capture

As the physical conformation of the chromosome influencesthe gene expression and eventually the cell fate, the single-cellHi-C technique is developed to provide a probe of the cell-to-cellorganizational variation of the genome.157,158 Recently, the 4DNucleome project, a National Institute of Health (NIH) fundedconsortium, has aimed to decipher the relationships betweenchromosomal conformation, tissue types, and diseases.

Physical mechanisms

The systems biology view of cell identity formation expandedour knowledge on coding mechanisms even down to single celllevels. However, cells make decisions under realistic physicalfactors to reach unique identities. To dissect the physical princi-ples behind cellular specialization, we discuss the internal andexternal influencers that play a role in a cell’s coordination andcooperation in development.

Intracellular

Molecular conversions within an individual cell are guided byphysical rules. Investigation of the governing physical mecha-nisms in different cell types is crucial to understand howcellular identity is formed and properly maintained in health.

Kinetic proofreading. The genetic code encompasses specificrecognition of transfer RNAs (tRNAs) to match anticodons tocodons in mRNAs. The error rate in this process is 1 in 10 000,enabling correct protein synthesis practically all the time to formappropriate cellular identities. Despite the existence of other similarsized molecules, specific molecular interactions achieve properassembly with low erroneous molecular recognition. Therefore,cells do not get affected by thermodynamic noise in gene transla-tion owing to a kinetic proofreading (KP) mechanism. Proposed byJohn Hopfield, KP incorporates multiple selection steps to enrichcorrect binding. Incorrect conjugation still occurs but falls offbefore affecting the process.159,160 KP is also a general mechanismin cells to increase the specificity of molecular binding eventsincluding cell-to-cell interactions. For instance, blood cells utilizeKP to recognize antigen-presenting cells in our immune system.

DNA–protein interactions. Different transcriptional programsin cells are regulated by interactions of proteins with DNA, leadingto activation or repression of gene expression. Combinations ofmultiple transcription factors create cell specific transcriptionalprogramming in both prokaryotes and eukaryotes.161 The bindingevents of TFs are often modelled by thermodynamics principles. Asimple TF–DNA interaction models the total binding energies andbinding probability of a TF to its target.162 The promoter archi-tecture of each gene includes activators and repressors to regulategene expression. A general formalism was developed to takeinto account the binding strength and abundance of theseregulators. This approach linked physical parameters of tran-scriptional regulation fugacity to the fold-change (FC) in geneexpression.163

Chromatin structure. Another physical mechanism is thatepigenetic codes change the gene expression by modifying the

structural distribution of chromatin.164 Sequential histonemodifications silence gene expression by compacting chromatinto form a heterochromatin.165 A change in the structure ofchromatin also allows the recruitment of other proteins.166,167

An analytical model was developed to investigate a few histonemodifications.168 DNA methylation increases the rigidity of thechromatin structure and disfavours the positioning of nucleo-somes on DNA.169,170

Splicing entropy. RNA maturation process includes theassembly of gene products through splicing machinery. AberrantRNA types are formed in diseases such as cancer. This disorder ismodelled by Shannon’s entropy.171–173 Cancer cells exhibitedhighly significant splicing disorder compared to healthy subjects.

Extracellular

Cells make decisions under environmental factors such assignalling molecules, extracellular matrix (ECM) contexts, mecha-nical features, and interaction with other cells.174 Cell’s morphol-ogical characteristics such as shape, ECM’s stiffness andtopography are the main physical regulators to affect cell’s identitydevelopment.175 Stem cells sense the properties of an ECM byfeedback signalling from mechanical and biochemical cues.176

Engineering niches for stem cell studies will open up new direc-tions in cellular reprogramming and tissue regeneration.177

ApplicationsResearch

Single cell approaches yield rich data to decipher complex bio-logical networks in organisms with molecular sensitivity. Thesedetailed cellular data are highly valuable for systems biologyresearch. In particular, measuring gene products allows thedevelopment of cell specific mathematical models to describebiological information processing in single cells.178,179 Theseoutcomes reveal a network of molecular components (nucleicacids and proteins) to formulate and test hypotheses in cellularmachineries. Complementary implementation of single cell studiesin medical settings facilitates systems medicine research toreconstruct biological networks in health and disease for transla-tional purposes, paving the way for personalized and predictivemedicine.180–183 These emerging directions provide opportunitiesto understand molecular principles of diseases and determine thebest drug screening strategies in single cells. Primarily, cancerresearch leverages emerging single cell sequencing methodsto study the molecular mechanisms of tumor heterogeneity,evolution, evasion, metastasis, and resistance.184,185 Single cellDNA/RNA sequencing in primary sites and circulating tumorcells revealed copy number variations, mutation rates, clonaldynamics, and transcriptional mapping within various cancertypes including pancreatic, breast, lung, colon, and bladdercancers, and melanoma.

Diagnostics

Early screening and detection of diseases including cancer iscritical to improve the healthcare system. Molecular profiling

Review Molecular BioSystems

Publ

ishe

d on

19

July

201

6. D

ownl

oade

d by

Cal

ifor

nia

Inst

itute

of

Tec

hnol

ogy

on 1

1/11

/201

6 15

:10:

45.

View Article Online

Page 9: Cellular identity at the single-cell level · Molecular Identifier (UMI) based molecule-counting method at the single-cell level. He is currently working as a postdoctoral research

This journal is©The Royal Society of Chemistry 2016 Mol. BioSyst., 2016, 12, 2965--2979 | 2973

techniques such as transcript profiling allowed biomarkerdiscovery in disease to develop diagnostic assays. Bodily fluidssuch as human saliva were used as a non-invasive diagnosticmethod to define a set of biomarker signatures for earlydetection.186 Transcriptomic biomarkers made it possible todifferentiate lung cancer patients from normal subjects with highsensitivity and specificity. Similarly, a liquid biopsy methodutilized single cell sequencing to profile circulating tumour cellsin breast cancers.187 Single cell genotyping revealed heterogenousmutations in acute myeloid leukaemia (AML).188 In AML samples,deciphering the distribution of FLT3 and NPM1 mutations inclonal populations suggested significant tumor variability forbetter diagnosis and potentially for monitoring the progressionof disease.

Therapeutics

The ultimate therapy should be to use patient derived inducedpluripotent stem cells (iPS) for reprogramming to any cell typeof interest towards a specific disorder. For instance, molecularchanges during reprogramming from fibroblast to neuronshave been dissected by single-cell RNA-seq.189 This approach hasbeen tested in medical settings for hematological, neurological,cardiovascular, metabolic, endocrine, and muscular disorders,yielding insights into drug based therapies.190,191 Reprogrammingtherapies included sickle cell anaemia treatment in mice bycorrecting mutations in hematopoietic stem cells followed bytransplantation.192 Transferring iPS technology to correct humandiseases is promising, but many challenges in complex molecularengineering efforts covering transcriptional and epigenetic regula-tions and gene editing approaches need to be addressed to realizestem cell based therapies.193

Discussion

Multiple layers of regulators from chromatin states to post-translational modifications contribute to the cellular identity.The interconnection between these layers presents challengesto dissect the roles of each player. Hence, the ultimate goal isto merge studies of molecular decoding from different levelsof gene control mechanisms. Simultaneous profiling of thegenome, transcriptome, epigenome, and proteome has signifi-cant potential to reveal cell identity formation. Currently, thegenome and transcriptome have been probed even in the sameindividual cell.194,195 Increasing efforts measure single cellcorrelations of transcripts and proteins.196,197

Coverage in single cell sequencing is limited compared tobulk DNA/RNA sequencing due to amplification errors (allelicdropouts, distortion, false positive rates, and non-uniform usage)and small sample volumes. Mapping the entire genome (100%coverage) needs technological advancements to reduce sequencedropout errors, which can then accurately measure mutationswithout being affected by experimental noise. Increasing effortssuch as MALBAC and MDA have improved the coverage levels ofDNA sequencing to more than 90% in individual cells.198–202 Onthe other hand, current RNA sequencing methods have 5–25%

detection efficiency due to the poor conversion of RNAs to cDNAand eventually to amplified sequences. Reducing the reactionvolume (from microliters to nanoliters within microfluidicchambers) captures more RNA molecules.90 Alternative bio-chemical strategies are needed to enhance the RNA/cDNAconversion rate for efficient RNA sequencing approaches.Besides, to cover the entire proteome at the single cell level,efficient labelling strategies are desired to identify each andevery protein in complex tissues and organs. Profiling othermolecules (metabolites, lipids, small molecules, non-codingRNAs, among others) would complement existing single cellanalysis techniques.

Conclusion

The presented cellular identity study will transform develop-mental biology research with a particular emphasis on theintegration of systems level molecular analyses and underlyingphysical exploration. Understanding different levels of molecularcoding at the single cell level will explain how cellular identity isgained in health and lost in disease. Physical control mecha-nisms will explain how cells specialize under many environmentalstimulants. Finally, defining the cell state and reprogramming thecell identity will revolutionize medical research and practice.

Acknowledgements

A. F. C. is supported by a Career Award at the Scientific Inter-face from the Burroughs Wellcome Fund. The authors thankSten Linnarsson for help with figures.

Notes and references

1 E. Bianconi, A. Piovesan, F. Facchin, A. Beraudi,R. Casadei, F. Frabetti, L. Vitale, M. C. Pelleri, S. Tassani,F. Piva, S. Perez-Amodio, P. Strippoli and S. Canaider, Ann.Hum. Biol., 2013, 40, 463–471.

2 F. Crick, Nature, 1970, 227, 561–563.3 D. Greenbaum, C. Colangelo, K. Williams and M. Gerstein,

Genome Biol., 2003, 4, 117.4 E. S. Yeung, Angew. Chem., Int. Ed., 2011, 50, 583–585.5 A. Mortazavi, B. A. Williams, K. McCue, L. Schaeffer and

B. Wold, Nat. Methods, 2008, 5, 621–628.6 P. J. Park, Nat. Rev. Genet., 2009, 10, 669–680.7 G. M. Rubin, M. D. Yandell, J. R. Wortman, G. L. Gabor

Miklos, C. R. Nelson, I. K. Hariharan, M. E. Fortini,P. W. Li, R. Apweiler, W. Fleischmann, J. M. Cherry,S. Henikoff, M. P. Skupski, S. Misra, M. Ashburner,E. Birney, M. S. Boguski, T. Brody, P. Brokstein,S. E. Celniker, S. A. Chervitz, D. Coates, A. Cravchik,A. Gabrielian, R. F. Galle, W. M. Gelbart, R. A. George,L. S. B. Goldstein, F. Gong, P. Guan, N. L. Harris, B. A. Hay,R. A. Hoskins, J. Li, Z. Li, R. O. Hynes, S. J. M. Jones,P. M. Kuehl, B. Lemaitre, J. T. Littleton, D. K. Morrison,C. Mungall, P. H. O’Farrell, O. K. Pickeral, C. Shue,

Molecular BioSystems Review

Publ

ishe

d on

19

July

201

6. D

ownl

oade

d by

Cal

ifor

nia

Inst

itute

of

Tec

hnol

ogy

on 1

1/11

/201

6 15

:10:

45.

View Article Online

Page 10: Cellular identity at the single-cell level · Molecular Identifier (UMI) based molecule-counting method at the single-cell level. He is currently working as a postdoctoral research

2974 | Mol. BioSyst., 2016, 12, 2965--2979 This journal is©The Royal Society of Chemistry 2016

L. B. Vosshall, J. Zhang, Q. Zhao, X. H. Zheng, F. Zhong,W. Zhong, R. Gibbs, J. C. Venter, M. D. Adams and S. Lewis,Science, 2000, 287, 2204–2215.

8 A. R. Mushegian, J. R. Garey, J. Martin and L. X. Liu,Genome Res., 1998, 8, 590–598.

9 S. Bergmann, J. Ihmels and N. Barkai, PLoS Biol., 2003,2, e9.

10 A. T. Chinwalla, L. L. Cook, K. D. Delehaunty, G. A. Fewell,L. A. Fulton, R. S. Fulton, T. A. Graves, L. W. Hillier,E. R. Mardis, J. D. McPherson, T. L. Miner, W. E. Nash,J. O. Nelson, M. N. Nhan, K. H. Pepin, C. S. Pohl,T. C. Ponce, B. Schultz, J. Thompson, E. Trevaskis,R. H. Waterston, M. C. Wendl, R. K. Wilson, S.-P. Yang,P. An, E. Berry, B. Birren, T. Bloom, D. G. Brown, J. Butler,M. Daly, R. David, J. Deri, S. Dodge, K. Foley, D. Gage,S. Gnerre, T. Holzer, D. B. Jaffe, M. Kamal, E. K. Karlsson,C. Kells, A. Kirby, E. J. Kulbokas, E. S. Lander, T. Landers,J. P. Leger, R. Levine, K. Lindblad-Toh, E. Mauceli,J. H. Mayer, M. McCarthy, J. Meldrim, J. P. Mesirov,R. Nicol, C. Nusbaum, S. Seaman, T. Sharpe, A. Sheridan,J. B. Singer, R. Santos, B. Spencer, N. Stange-Thomann,J. P. Vinson, C. M. Wade, J. Wierzbowski, D. Wyman,M. C. Zody, E. Birney, N. Goldman, A. Kasprzyk,E. Mongin, A. G. Rust, G. Slater, A. Stabenau, A. Ureta-Vidal, S. Whelan, R. Ainscough, J. Attwood, J. Bailey,K. Barlow, S. Beck, J. Burton, M. Clamp, C. Clee,A. Coulson, J. Cuff, V. Curwen, T. Cutts, J. Davies,E. Eyras, D. Grafham, S. Gregory, T. Hubbard, A. Hunt,M. Jones, A. Joy, S. Leonard, C. Lloyd, L. Matthews,S. McLaren, K. McLay, B. Meredith, J. C. Mullikin,Z. Ning, K. Oliver, E. Overton-Larty, R. Plumb, S. Potter,M. Quail, J. Rogers, C. Scott, S. Searle, R. Shownkeen,S. Sims, M. Wall, A. P. West, D. Willey, S. Williams,J. F. Abril, R. Guigo, G. Parra, P. Agarwal, R. Agarwala,D. M. Church, W. Hlavina, D. R. Maglott, V. Sapojnikov,M. Alexandersson, L. Pachter, S. E. Antonarakis,E. T. Dermitzakis, A. Reymond, C. Ucla, R. Baertsch,M. Diekhans, T. S. Furey, A. Hinrichs, F. Hsu,D. Karolchik, W. J. Kent, K. M. Roskin, M. S. Schwartz,C. Sugnet, R. J. Weber, P. Bork, I. Letunic, M. Suyama,D. Torrents, E. M. Zdobnov, M. Botcherby, S. D. Brown,R. D. Campbell, I. Jackson, N. Bray, O. Couronne,I. Dubchak, A. Poliakov, E. M. Rubin, M. R. Brent,P. Flicek, E. Keibler, I. Korf, S. Batalov, C. Bult,W. N. Frankel, P. Carninci, Y. Hayashizaki, J. Kawai,Y. Okazaki, S. Cawley, D. Kulp, R. Wheeler,F. Chiaromonte, F. S. Collins, A. Felsenfeld, M. Guyer,J. Peterson, K. Wetterstrand, R. R. Copley, R. Mott,C. Dewey, N. J. Dickens, R. D. Emes, L. Goodstadt,C. P. Ponting, E. Winter, D. M. Dunn, A. C. von Nieder-hausern, R. B. Weiss, S. R. Eddy, L. S. Johnson, T. A. Jones,L. Elnitski, D. L. Kolbe, P. Eswara, W. Miller,M. J. O’Connor, S. Schwartz, R. A. Gibbs, D. M. Muzny,G. Glusman, A. Smit, E. D. Green, R. C. Hardison, S. Yang,D. Haussler, A. Hua, B. A. Roe, R. S. Kucherlapati,K. T. Montgomery, J. Li, M. Li, S. Lucas, B. Ma,

W. R. McCombie, M. Morgan, P. Pevzner, G. Tesler,J. Schultz, D. R. Smith, J. Tromp, K. C. Worley andE. D. Green, Nature, 2002, 420, 520–562.

11 J. F. Crow, Daedalus, 2002, 131, 81–88.12 A. Kondrashov, Nature, 2012, 488, 467–468.13 C. F. Baer, M. M. Miyamoto and D. R. Denver, Nat. Rev.

Genet., 2007, 8, 619–631.14 A. Kong, M. L. Frigge, G. Masson, S. Besenbacher,

P. Sulem, G. Magnusson, S. A. Gudjonsson, A. Sigurdsson,A. Jonasdottir, A. Jonasdottir, W. S. W. Wong, G. Sigurdsson,G. B. Walters, S. Steinberg, H. Helgason, G. Thorleifsson,D. F. Gudbjartsson, A. Helgason, O. T. Magnusson,U. Thorsteinsdottir and K. Stefansson, Nature, 2012, 488,471–475.

15 F. H. C. Crick, J. Mol. Biol., 1968, 38, 367–379.16 A. Ambrogelly, S. Palioura and D. Soll, Nat. Chem. Biol.,

2007, 3, 29–35.17 E. V. Koonin and A. S. Novozhilov, IUBMB Life, 2009, 61,

99–111.18 D. Prandi, S. C. Baca, A. Romanel, C. E. Barbieri, J.-M.

Mosquera, J. Fontugne, H. Beltran, A. Sboner, L. A.Garraway, M. A. Rubin and F. Demichelis, Genome Biol.,2014, 15, 439.

19 J. Wang and R. E. Davis, Curr. Opin. Genet. Dev., 2014, 27,26–34.

20 T. S. Alioto, I. Buchhalter, S. Derdak, B. Hutter, M. D.Eldridge, E. Hovig, L. E. Heisler, T. A. Beck, J. T. Simpson,L. Tonon, A.-S. Sertier, A.-M. Patch, N. Jager, P. Ginsbach,R. Drews, N. Paramasivam, R. Kabbe, S. Chotewutmontri,N. Diessl, C. Previti, S. Schmidt, B. Brors, L. Feuerbach,M. Heinold, S. Grobner, A. Korshunov, P. S. Tarpey,A. P. Butler, J. Hinton, D. Jones, A. Menzies, K. Raine,R. Shepherd, L. Stebbings, J. W. Teague, P. Ribeca,F. C. Giner, S. Beltran, E. Raineri, M. Dabad, S. C. Heath,M. Gut, R. E. Denroche, N. J. Harding, T. N. Yamaguchi,A. Fujimoto, H. Nakagawa, V. Quesada, R. Valdes-Mas,S. Nakken, D. Vodak, L. Bower, A. G. Lynch, C. L.Anderson, N. Waddell, J. V. Pearson, S. M. Grimmond,M. Peto, P. Spellman, M. He, C. Kandoth, S. Lee, J. Zhang,L. Letourneau, S. Ma, S. Seth, D. Torrents, L. Xi,D. A. Wheeler, C. Lopez-Otın, E. Campo, P. J. Campbell,P. C. Boutros, X. S. Puente, D. S. Gerhard, S. M. Pfister,J. D. McPherson, T. J. Hudson, M. Schlesner, P. Lichter,R. Eils, D. T. W. Jones and I. G. Gut, Nat. Commun., 2015,6, 10001.

21 M. J. Keogh and P. F. Chinnery, Clin. Neurol. Neurosurg.,2013, 115, 948–953.

22 E. Lieberman-Aiden, N. L. van Berkum, L. Williams,M. Imakaev, T. Ragoczy, A. Telling, I. Amit, B. R. Lajoie,P. J. Sabo, M. O. Dorschner, R. Sandstrom, B. Bernstein,M. A. Bender, M. Groudine, A. Gnirke, J. Stamatoyannopoulos,L. A. Mirny, E. S. Lander and J. Dekker, Science, 2009, 326,289–293.

23 D. G. Lupianez, K. Kraft, V. Heinrich, P. Krawitz, F. Brancati,E. Klopocki, D. Horn, H. Kayserili, J. M. Opitz, R. Laxova,F. Santos-Simarro, B. Gilbert-Dussardier, L. Wittler,

Review Molecular BioSystems

Publ

ishe

d on

19

July

201

6. D

ownl

oade

d by

Cal

ifor

nia

Inst

itute

of

Tec

hnol

ogy

on 1

1/11

/201

6 15

:10:

45.

View Article Online

Page 11: Cellular identity at the single-cell level · Molecular Identifier (UMI) based molecule-counting method at the single-cell level. He is currently working as a postdoctoral research

This journal is©The Royal Society of Chemistry 2016 Mol. BioSyst., 2016, 12, 2965--2979 | 2975

M. Borschiwer, S. A. Haas, M. Osterwalder, M. Franke,B. Timmermann, J. Hecht, M. Spielmann, A. Visel andS. Mundlos, Cell, 2015, 161, 1012–1025.

24 D. G. Lupianez, M. Spielmann and S. Mundlos, TrendsGenet., 2016, 32, 225–237.

25 P. H. L. Krijger, B. Di Stefano, E. de Wit, F. Limone,C. van Oevelen, W. de Laat and T. Graf, Cell Stem Cell,2016, 18, 597–610.

26 T. Marquardt and S. L. Pfaff, Cell, 2001, 106, 651–654.27 J. A. Zhang, A. Mortazavi, B. A. Williams, B. J. Wold and

E. V. Rothenberg, Cell, 2012, 149, 467–482.28 S. Siegert, E. Cabuy, B. G. Scherf, H. Kohler, S. Panda,

Y.-Z. Le, H. J. Fehling, D. Gaidatzis, M. B. Stadler andB. Roska, Nat. Neurosci., 2012, 15, 487–495.

29 C. Cobaleda, A. Schebesta, A. Delogu and M. Busslinger,Nat. Immunol., 2007, 8, 463–470.

30 T. I. Lee and R. A. Young, Cell, 2013, 152, 1237–1251.31 H. Sasaki-Iwaoka, K. Maruyama, H. Endoh, T. Komori,

S. Kato and H. Kawashima, J. Bone Miner. Res. Off. J. Am.Soc. Bone Miner. Res., 1999, 14, 248–255.

32 Y. Zheng, C. Devitt, J. Liu, J. Mei and S. X. Skapek,Dev. Biol., 2013, 380, 49–57.

33 W. A. Whyte, D. A. Orlando, D. Hnisz, B. J. Abraham,C. Y. Lin, M. H. Kagey, P. B. Rahl, T. I. Lee andR. A. Young, Cell, 2013, 153, 307–319.

34 D. Hnisz, B. J. Abraham, T. I. Lee, A. Lau, V. Saint-Andre,A. A. Sigova, H. A. Hoke and R. A. Young, Cell, 2013, 155,934–947.

35 J. M. Dowen, Z. P. Fan, D. Hnisz, G. Ren, B. J. Abraham,L. N. Zhang, A. S. Weintraub, J. Schuijers, T. I. Lee, K. Zhaoand R. A. Young, Cell, 2014, 159, 374–387.

36 D. Hay, J. R. Hughes, C. Babbs, J. O. J. Davies, B. J. Graham,L. L. P. Hanssen, M. T. Kassouf, A. M. Oudelaar, J. A.Sharpe, M. C. Suciu, J. Telenius, R. Williams, C. Rode,P.-S. Li, L. A. Pennacchio, J. A. Sloane-Stanley, H. Ayyub,S. Butler, T. Sauka-Spengler, R. J. Gibbons, A. J. H. Smith,W. G. Wood and D. R. Higgs, Nat. Genet., 2016, DOI:10.1038/ng.3605.

37 B. M. Turner, Nat. Cell Biol., 2007, 9, 2–6.38 M. Spivakov and A. G. Fisher, Nat. Rev. Genet., 2007, 8,

263–271.39 B. D. Strahl and C. D. Allis, Nature, 2000, 403, 41–45.40 T. Jenuwein and C. D. Allis, Science, 2001, 293,

1074–1080.41 B. M. Turner, Cell, 2002, 111, 285–291.42 M. S. Cosgrove, J. D. Boeke and C. Wolberger, Nat. Struct.

Mol. Biol., 2004, 11, 1037–1043.43 N. D. Heintzman, G. C. Hon, R. D. Hawkins, P. Kheradpour,

A. Stark, L. F. Harp, Z. Ye, L. K. Lee, R. K. Stuart, C. W. Ching,K. A. Ching, J. E. Antosiewicz-Bourget, H. Liu, X. Zhang,R. D. Green, V. V. Lobanenkov, R. Stewart, J. A. Thomson,G. E. Crawford, M. Kellis and B. Ren, Nature, 2009, 459,108–112.

44 B. A. Benayoun, E. A. Pollina, D. Ucar, S. Mahmoudi, K. Karra,E. D. Wong, K. Devarajan, A. C. Daugherty, A. B. Kundaje,E. Mancini, B. C. Hitz, R. Gupta, T. A. Rando, J. C. Baker,

M. P. Snyder, J. M. Cherry and A. Brunet, Cell, 2014, 158,673–688.

45 V. Orlando, Cell, 2003, 112, 599–606.46 T. I. Lee, R. G. Jenner, L. A. Boyer, M. G. Guenther,

S. S. Levine, R. M. Kumar, B. Chevalier, S. E. Johnstone,M. F. Cole, K. Isono, H. Koseki, T. Fuchikami, K. Abe,H. L. Murray, J. P. Zucker, B. Yuan, G. W. Bell,E. Herbolsheimer, N. M. Hannett, K. Sun, D. T. Odom,A. P. Otte, T. L. Volkert, D. P. Bartel, D. A. Melton,D. K. Gifford, R. Jaenisch and R. A. Young, Cell, 2006,125, 301–313.

47 L. Ringrose and R. Paro, Development, 2007, 134, 223–232.48 A. Sparmann and M. van Lohuizen, Nat. Rev. Cancer, 2006,

6, 846–856.49 A. Bird, Genes Dev., 2002, 16, 6–21.50 K. E. Varley, J. Gertz, K. M. Bowling, S. L. Parker,

T. E. Reddy, F. Pauli-Behn, M. K. Cross, B. A. Williams,J. A. Stamatoyannopoulos, G. E. Crawford, D. M. Absher,B. J. Wold and R. M. Myers, Genome Res., 2013, 23,555–567.

51 S. Dhawan, S. Georgia, S. Tschen, G. Fan and A. Bhushan,Dev. Cell, 2011, 20, 419–429.

52 B. Bodega and V. Orlando, Curr. Opin. Cell Biol., 2014, 31,67–73.

53 M. Bundo, M. Toyoshima, Y. Okada, W. Akamatsu, J. Ueda,T. Nemoto-Miyauchi, F. Sunaga, M. Toritsuka, D. Ikawa,A. Kakita, M. Kato, K. Kasai, T. Kishimoto, H. Nawa,H. Okano, T. Yoshikawa, T. Kato and K. Iwamoto, Neuron,2014, 81, 306–313.

54 G. D. Evrony, X. Cai, E. Lee, L. B. Hills, P. C. Elhosary,H. S. Lehmann, J. J. Parker, K. D. Atabay, E. C. Gilmore,A. Poduri, P. J. Park and C. A. Walsh, Cell, 2012, 151,483–496.

55 G. J. Faulkner, Y. Kimura, C. O. Daub, S. Wani, C. Plessy,K. M. Irvine, K. Schroder, N. Cloonan, A. L. Steptoe,T. Lassmann, K. Waki, N. Hornig, T. Arakawa,H. Takahashi, J. Kawai, A. R. R. Forrest, H. Suzuki,Y. Hayashizaki, D. A. Hume, V. Orlando, S. M. Grimmondand P. Carninci, Nat. Genet., 2009, 41, 563–571.

56 A. R. Muotri, M. C. N. Marchetto, N. G. Coufal, R. Oefner,G. Yeo, K. Nakashima and F. H. Gage, Nature, 2010, 468,443–446.

57 S. Wissing, M. Munoz-Lopez, A. Macia, Z. Yang,M. Montano, W. Collins, J. L. Garcia-Perez, J. V. Moranand W. C. Greene, Hum. Mol. Genet., 2012, 21, 208–218.

58 S. Loewer, M. N. Cabili, M. Guttman, Y.-H. Loh, K. Thomas,I. H. Park, M. Garber, M. Curran, T. Onder, S. Agarwal,P. D. Manos, S. Datta, E. S. Lander, T. M. Schlaeger,G. Q. Daley and J. L. Rinn, Nat. Genet., 2010, 42,1113–1117.

59 M. Guttman, I. Amit, M. Garber, C. French, M. F. Lin,D. Feldser, M. Huarte, O. Zuk, B. W. Carey, J. P. Cassady,M. N. Cabili, R. Jaenisch, T. S. Mikkelsen, T. Jacks,N. Hacohen, B. E. Bernstein, M. Kellis, A. Regev, J. L.Rinn and E. S. Lander, Nature, 2009, 458, 223–227.

60 A. Fatica and I. Bozzoni, Nat. Rev. Genet., 2014, 15, 7–21.

Molecular BioSystems Review

Publ

ishe

d on

19

July

201

6. D

ownl

oade

d by

Cal

ifor

nia

Inst

itute

of

Tec

hnol

ogy

on 1

1/11

/201

6 15

:10:

45.

View Article Online

Page 12: Cellular identity at the single-cell level · Molecular Identifier (UMI) based molecule-counting method at the single-cell level. He is currently working as a postdoctoral research

2976 | Mol. BioSyst., 2016, 12, 2965--2979 This journal is©The Royal Society of Chemistry 2016

61 S. Quinodoz and M. Guttman, Trends Cell Biol., 2014, 24,651–663.

62 T. Xia, Q. Liao, X. Jiang, Y. Shao, B. Xiao, Y. Xi and J. Guo,Sci. Rep., 2014, 4, 6088.

63 P. Paci, T. Colombo and L. Farina, BMC Syst. Biol., 2014,8, 83.

64 F. Gaiti, S. L. Fernandez-Valverde, N. Nakanishi, A. D.Calcino, I. Yanai, M. Tanurdzic and B. M. Degnan, Mol.Biol. Evol., 2015, msv117.

65 Z. Du, T. Sun, E. Hacisuleyman, T. Fei, X. Wang, M. Brown,J. L. Rinn, M. G.-S. Lee, Y. Chen, P. W. Kantoff andX. S. Liu, Nat. Commun., 2016, 7, 10982.

66 A. O. Ribeiro, C. R. G. Schoof, A. Izzotti, L. V. Pereira andL. R. Vasques, MicroRNA, 2014, 3, 45–53.

67 H. Kaspi, R. Pasvolsky and E. Hornstein, Trends Endocrinol.Metab., 2014, 25, 285–292.

68 M. C. Siomi, K. Sato, D. Pezic and A. A. Aravin, Nat. Rev.Mol. Cell Biol., 2011, 12, 246–258.

69 J. Gonzalez, H. Qi, N. Liu and H. Lin, Cell Rep., 2015, 12,150–161.

70 X. Ma, S. Wang, T. Do, X. Song, M. Inaba, Y. Nishimoto,L. Liu, Y. Gao, Y. Mao, H. Li, W. McDowell, J. Park,K. Malanowski, A. Peak, A. Perera, H. Li, K. Gaudenz,J. Haug, Y. Yamashita, H. Lin, J. Ni and T. Xie, PLoS One,2014, 9, e90267.

71 N. Charlet-B, G. Singh, T. A. Cooper and P. Logan, Mol.Cell, 2002, 9, 649–658.

72 C. J. David and J. L. Manley, Genes Dev., 2008, 22, 279–285.73 A. J. Matlin, F. Clark and C. W. J. Smith, Nat. Rev. Mol. Cell

Biol., 2005, 6, 386–398.74 M. Gabut, P. Samavarchi-Tehrani, X. Wang, V. Slobodeniuc,

D. O’Hanlon, H.-K. Sung, M. Alvarez, S. Talukder, Q. Pan,E. O. Mazzoni, S. Nedelec, H. Wichterle, K. Woltjen,T. R. Hughes, P. W. Zandstra, A. Nagy, J. L. Wrana andB. J. Blencowe, Cell, 2011, 147, 132–146.

75 X. Yang, J. Coulombe-Huntington, S. Kang, G. M.Sheynkman, T. Hao, A. Richardson, S. Sun, F. Yang, Y. A.Shen, R. R. Murray, K. Spirohn, B. E. Begg, M. Duran-Frigola, A. MacWilliams, S. J. Pevzner, Q. Zhong, S. A.Trigg, S. Tam, L. Ghamsari, N. Sahni, S. Yi, M. D. Rodriguez,D. Balcha, G. Tan, M. Costanzo, B. Andrews, C. Boone,X. J. Zhou, K. Salehi-Ashtiani, B. Charloteaux, A. A. Chen,M. A. Calderwood, P. Aloy, F. P. Roth, D. E. Hill, L. M.Iakoucheva, Y. Xia and M. Vidal, Cell, 2016, 164, 805–817.

76 J. S. Mattick and M. F. Mehler, Trends Neurosci., 2008, 31,227–233.

77 J. J. C. Rosenthal and P. H. Seeburg, Neuron, 2012, 74,432–439.

78 E. Park, B. Williams, B. J. Wold and A. Mortazavi, GenomeRes., 2012, 22, 1626–1633.

79 Q. Zhang and X. Xiao, Nat. Methods, 2015, 12, 347–350.80 S. Sharma, S. K. Patnaik, R. Thomas Taggart, E. D. Kannisto,

S. M. Enriquez, P. Gollnick and B. E. Baysal, Nat. Commun.,2015, 6, 6881.

81 S. Xue, S. Tian, K. Fujii, W. Kladwang, R. Das and M. Barna,Nature, 2015, 517, 33–38.

82 S. Xue and M. Barna, Nat. Rev. Mol. Cell Biol., 2012, 13,355–369.

83 N. Slavov, S. Semrau, E. Airoldi, B. Budnik andA. van Oudenaarden, Cell Rep., 2015, 13, 865–873.

84 M. Gallego and D. M. Virshup, Nat. Rev. Mol. Cell Biol.,2007, 8, 139–148.

85 S. Westermann and K. Weber, Nat. Rev. Mol. Cell Biol.,2003, 4, 938–948.

86 A. M. Bode and Z. Dong, Nat. Rev. Cancer, 2004, 4, 793–805.87 C. M. Dobson, Nature, 2003, 426, 884–890.88 J. P. Thaler, S.-K. Lee, L. W. Jurata, G. N. Gill and S. L. Pfaff,

Cell, 2002, 110, 237–249.89 Q. F. Wills, K. J. Livak, A. J. Tipping, T. Enver, A. J. Goldson,

D. W. Sexton and C. Holmes, Nat. Biotechnol., 2013, 31,748–752.

90 S. De, Trends Genet. TIG, 2011, 27, 217–223.91 M. J. McConnell, M. R. Lindberg, K. J. Brennand, J. C.

Piper, T. Voet, C. Cowing-Zitron, S. Shumilina, R. S. Lasken,J. R. Vermeesch, I. M. Hall and F. H. Gage, Science, 2013, 342,632–637.

92 J. Wang, H. C. Fan, B. Behr and S. R. Quake, Cell, 2012, 150,402–412.

93 N. Navin, J. Kendall, J. Troge, P. Andrews, L. Rodgers,J. McIndoo, K. Cook, A. Stepansky, D. Levy, D. Esposito,L. Muthuswamy, A. Krasnitz, W. R. McCombie, J. Hicksand M. Wigler, Nature, 2011, 472, 90–94.

94 T. Garvin, R. Aboukhalil, J. Kendall, T. Baslan, G. S. Atwal,J. Hicks, M. Wigler and M. C. Schatz, Nat. Methods, 2015,12, 1058–1060.

95 X. Cai, G. D. Evrony, H. S. Lehmann, P. C. Elhosary, B. K. Mehta,A. Poduri and C. A. Walsh, Cell Rep., 2014, 8, 1280–1289.

96 M. A. Lodato, M. B. Woodworth, S. Lee, G. D. Evrony,B. K. Mehta, A. Karger, S. Lee, T. W. Chittenden,A. M. D’Gama, X. Cai, L. J. Luquette, E. Lee, P. J. Parkand C. A. Walsh, Science, 2015, 350, 94–98.

97 S. Islam, A. Zeisel, S. Joost, G. La Manno, P. Zajac,M. Kasper, P. Lonnerberg and S. Linnarsson, Nat. Methods,2014, 11, 163–166.

98 S. Islam, U. Kjallquist, A. Moliner, P. Zajac, J.-B. Fan,P. Lonnerberg and S. Linnarsson, Nat. Protoc., 2012, 7,813–828.

99 S. Picelli, Å. K. Bjorklund, B. Reinius, S. Sagasser, G. Winbergand R. Sandberg, Genome Res., 2014, 24, 2033–2040.

100 L. Wen and F. Tang, Genome Biol., 2016, 17, 71.101 D. Arendt, Nat. Rev. Genet., 2008, 9, 868–882.102 M. K. Vickaryous and B. K. Hall, Biol. Rev. Cambridge

Philos. Soc., 2006, 81, 425–455.103 P. Dalerba, T. Kalisky, D. Sahoo, P. S. Rajendran, M. E.

Rothenberg, A. A. Leyrat, S. Sim, J. Okamoto, D. M.Johnston, D. Qian, M. Zabala, J. Bueno, N. F. Neff, J. Wang,A. A. Shelton, B. Visser, S. Hisamori, Y. Shimono,M. van de Wetering, H. Clevers, M. F. Clarke andS. R. Quake, Nat. Biotechnol., 2011, 29, 1120–1127.

104 B. Treutlein, D. G. Brownfield, A. R. Wu, N. F. Neff, G. L.Mantalas, F. H. Espinoza, T. J. Desai, M. A. Krasnow andS. R. Quake, Nature, 2014, 509, 371–375.

Review Molecular BioSystems

Publ

ishe

d on

19

July

201

6. D

ownl

oade

d by

Cal

ifor

nia

Inst

itute

of

Tec

hnol

ogy

on 1

1/11

/201

6 15

:10:

45.

View Article Online

Page 13: Cellular identity at the single-cell level · Molecular Identifier (UMI) based molecule-counting method at the single-cell level. He is currently working as a postdoctoral research

This journal is©The Royal Society of Chemistry 2016 Mol. BioSyst., 2016, 12, 2965--2979 | 2977

105 A. Zeisel, A. B. Munoz-Manchado, S. Codeluppi,P. Lonnerberg, G. L. Manno, A. Jureus, S. Marques,H. Munguba, L. He, C. Betsholtz, C. Rolny, G. Castelo-Branco, J. Hjerling-Leffler and S. Linnarsson, Science, 2015,347, 1138–1142.

106 D. Grun, A. Lyubimova, L. Kester, K. Wiebrands, O. Basak,N. Sasaki, H. Clevers and A. van Oudenaarden, Nature,2015, 525, 251–255.

107 D. A. Jaitin, E. Kenigsberg, H. Keren-Shaul, N. Elefant,F. Paul, I. Zaretsky, A. Mildner, N. Cohen, S. Jung, A. Tanayand I. Amit, Science, 2014, 343, 776–779.

108 F. Paul, Y. Arkin, A. Giladi, D. A. Jaitin, E. Kenigsberg,H. Keren-Shaul, D. Winter, D. Lara-Astiaso, M. Gury,A. Weiner, E. David, N. Cohen, F. K. B. Lauridsen,S. Haas, A. Schlitzer, A. Mildner, F. Ginhoux, S. Jung,A. Trumpp, B. T. Porse, A. Tanay and I. Amit, Cell, 2015,163, 1663–1677.

109 A. P. Patel, I. Tirosh, J. J. Trombetta, A. K. Shalek,S. M. Gillespie, H. Wakimoto, D. P. Cahill, B. V. Nahed,W. T. Curry, R. L. Martuza, D. N. Louis, O. Rozenblatt-Rosen, M. L. Suva, A. Regev and B. E. Bernstein, Science,2014, 1254257.

110 Å. K. Bjorklund, M. Forkel, S. Picelli, V. Konya, J. Theorell,D. Friberg, R. Sandberg and J. Mjosberg, Nat. Immunol.,2016, 17, 451–460.

111 D. Usoskin, A. Furlan, S. Islam, H. Abdo, P. Lonnerberg,D. Lou, J. Hjerling-Leffler, J. Haeggstrom, O. Kharchenko,P. V. Kharchenko, S. Linnarsson and P. Ernfors, Nat.Neurosci., 2015, 18, 145–153.

112 E. Z. Macosko, A. Basu, R. Satija, J. Nemesh, K. Shekhar,M. Goldman, I. Tirosh, A. R. Bialas, N. Kamitaki,E. M. Martersteck, J. J. Trombetta, D. A. Weitz, J. R.Sanes, A. K. Shalek, A. Regev and S. A. McCarroll, Cell,2015, 161, 1202–1214.

113 A. M. Klein, L. Mazutis, I. Akartuna, N. Tallapragada,A. Veres, V. Li, L. Peshkin, D. A. Weitz and M. W.Kirschner, Cell, 2015, 161, 1187–1201.

114 N. Rifai, M. A. Gillette and S. A. Carr, Nat. Biotechnol., 2006,24, 971–983.

115 H. Runne, A. Kuhn, E. J. Wild, W. Pratyaksha,M. Kristiansen, J. D. Isaacs, E. Regulier, M. Delorenzi, S. J.Tabrizi and R. Luthi-Carter, Proc. Natl. Acad. Sci. U. S. A.,2007, 104, 14424–14429.

116 A. Han, J. Glanville, L. Hansmann and M. M. Davis, Nat.Biotechnol., 2014, 32, 684–692.

117 M. Levine and E. H. Davidson, Proc. Natl. Acad. Sci. U. S. A.,2005, 102, 4936–4942.

118 V. Moignard, I. C. Macaulay, G. Swiers, F. Buettner,J. Schutte, F. J. Calero-Nieto, S. Kinston, A. Joshi,R. Hannah, F. J. Theis, S. E. Jacobsen, M. F. de Bruijnand B. Gottgens, Nat. Cell Biol., 2013, 15, 363–372.

119 V. Moignard, S. Woodhouse, L. Haghverdi, A. J. Lilly,Y. Tanaka, A. C. Wilkinson, F. Buettner, I. C. Macaulay,W. Jawaid, E. Diamanti, S.-I. Nishikawa, N. Piterman,V. Kouskoff, F. J. Theis, J. Fisher and B. Gottgens, Nat.Biotechnol., 2015, 33, 269–276.

120 C. Trapnell, D. Cacchiarelli, J. Grimsby, P. Pokharel, S. Li,M. Morse, N. J. Lennon, K. J. Livak, T. S. Mikkelsen andJ. L. Rinn, Nat. Biotechnol., 2014, 32, 381–386.

121 T. Kouno, M. de Hoon, J. C. Mar, Y. Tomaru, M. Kawano,P. Carninci, H. Suzuki, Y. Hayashizaki and J. W. Shin,Genome Biol., 2013, 14, R118.

122 D. H. Kim, G. K. Marinov, S. Pepke, Z. S. Singer, P. He,B. Williams, G. P. Schroth, M. B. Elowitz and B. J. Wold,Cell Stem Cell, 2015, 16, 88–101.

123 M. N. Cabili, M. C. Dunagin, P. D. McClanahan, A. Biaesch,O. Padovan-Merhar, A. Regev, J. L. Rinn and A. Raj, GenomeBiol., 2015, 16, 20.

124 F. Tang, C. Barbacioru, Y. Wang, E. Nordman, C. Lee,N. Xu, X. Wang, J. Bodeau, B. B. Tuch, A. Siddiqui, K. Laoand M. A. Surani, Nat. Methods, 2009, 6, 377–382.

125 Q. Ma and H. Y. Chang, Genome Biol., 2016, 17, 68.126 A. K. Shalek, R. Satija, X. Adiconis, R. S. Gertner, J. T.

Gaublomme, R. Raychowdhury, S. Schwartz, N. Yosef,C. Malboeuf, D. Lu, J. J. Trombetta, D. Gennert,A. Gnirke, A. Goren, N. Hacohen, J. Z. Levin, H. Park andA. Regev, Nature, 2013, 498, 236–240.

127 L. Yan, M. Yang, H. Guo, L. Yang, J. Wu, R. Li, P. Liu,Y. Lian, X. Zheng, J. Yan, J. Huang, M. Li, X. Wu, L. Wen,K. Lao, R. Li, J. Qiao and F. Tang, Nat. Struct. Mol. Biol.,2013, 20, 1131–1139.

128 J. D. Welch, Y. Hu and J. F. Prins, Nucleic Acids Res., 2016,gkv1525.

129 A. F. Coskun and L. Cai, Nat. Methods, 2016, DOI: 10.1038/nmeth.3895.

130 C.-C. Wu, F. Kruse, M. D. Vasudevarao, J. P. Junker, D. C.Zebrowski, K. Fischer, E. S. Noel, D. Grun, E. Berezikov,F. B. Engel, A. van Oudenaarden, G. Weidinger andJ. Bakkers, Dev. Cell, 2016, 36, 36–49.

131 E. Lubeck, A. F. Coskun, T. Zhiyentayev, M. Ahmad andL. Cai, Nat. Methods, 2014, 11, 360–361.

132 J. H. Lee, E. R. Daugharthy, J. Scheiman, R. Kalhor,J. L. Yang, T. C. Ferrante, R. Terry, S. S. F. Jeanty, C. Li,R. Amamoto, D. T. Peters, B. M. Turczyk, A. H. Marblestone,S. A. Inverso, A. Bernard, P. Mali, X. Rios, J. Aach andG. M. Church, Science, 2014, 343, 1360–1363.

133 K. H. Chen, A. N. Boettiger, J. R. Moffitt, S. Wang andX. Zhuang, Science, 2015, 348, aaa6090.

134 N. Crosetto, M. Bienko and A. van Oudenaarden, Nat. Rev.Genet., 2015, 16, 57–66.

135 H. M. T. Choi, J. Y. Chang, L. A. Trinh, J. E. Padilla,S. E. Fraser and N. A. Pierce, Nat. Biotechnol., 2010, 28,1208–1212.

136 R. Ke, M. Mignardi, A. Pacureanu, J. Svedlund, J. Botling,C. Wahlby and M. Nilsson, Nat. Methods, 2013, 10,857–860.

137 P. L. Ståhl, F. Salmen, S. Vickovic, A. Lundmark,J. F. Navarro, J. Magnusson, S. Giacomello, M. Asp,J. O. Westholm, M. Huss, A. Mollbrink, S. Linnarsson,S. Codeluppi, Å. Borg, F. Ponten, P. I. Costea, P. Sahlen,J. Mulder, O. Bergmann, J. Lundeberg and J. Frisen,Science, 2016, 353, 78–82.

Molecular BioSystems Review

Publ

ishe

d on

19

July

201

6. D

ownl

oade

d by

Cal

ifor

nia

Inst

itute

of

Tec

hnol

ogy

on 1

1/11

/201

6 15

:10:

45.

View Article Online

Page 14: Cellular identity at the single-cell level · Molecular Identifier (UMI) based molecule-counting method at the single-cell level. He is currently working as a postdoctoral research

2978 | Mol. BioSyst., 2016, 12, 2965--2979 This journal is©The Royal Society of Chemistry 2016

138 M. Farlik, N. C. Sheffield, A. Nuzzo, P. Datlinger,A. Schonegger, J. Klughammer and C. Bock, Cell Rep.,2015, 10, 1386–1397.

139 S. A. Smallwood, H. J. Lee, C. Angermueller, F. Krueger,H. Saadeh, J. Peat, S. R. Andrews, O. Stegle, W. Reik andG. Kelsey, Nat. Methods, 2014, 11, 817–820.

140 D. Gomez, L. S. Shankman, A. T. Nguyen and G. K. Owens,Nat. Methods, 2013, 10, 171–177.

141 Y. Stelzer, C. S. Shivalila, F. Soldner, S. Markoulaki andR. Jaenisch, Cell, 2015, 163, 218–229.

142 L. Bintu, J. Yong, Y. E. Antebi, K. McCue, Y. Kazuki, N. Uno,M. Oshimura and M. B. Elowitz, Science, 2016, 351,720–724.

143 D. Mooijman, S. S. Dey, J.-C. Boisset, N. Crosetto andA. van Oudenaarden, Nat. Biotechnol., 2016, DOI:10.1038/nbt.3598.

144 A. Rotem, O. Ram, N. Shoresh, R. A. Sperling, A. Goren,D. A. Weitz and B. E. Bernstein, Nat. Biotechnol., 2015, 33,1165–1172.

145 J. Brind’Amour, S. Liu, M. Hudson, C. Chen, M. M. Karimiand M. C. Lorincz, Nat. Commun., 2015, 6, 6033.

146 C. Angermueller, S. J. Clark, H. J. Lee, I. C. Macaulay,M. J. Teng, T. X. Hu, F. Krueger, S. A. Smallwood,C. P. Ponting, T. Voet, G. Kelsey, O. Stegle and W. Reik,Nat. Methods, 2016, 13, 229–232.

147 S. P. Perfetto, P. K. Chattopadhyay and M. Roederer, Nat.Rev. Immunol., 2004, 4, 648–655.

148 S. C. Bendall, E. F. Simonds, P. Qiu, E. D. Amir,P. O. Krutzik, R. Finck, R. V. Bruggner, R. Melamed,A. Trejo, O. I. Ornatsky, R. S. Balderas, S. K. Plevritis,K. Sachs, D. Pe’er, S. D. Tanner and G. P. Nolan, Science,2011, 332, 687–696.

149 G. MacBeath, Nat. Genet., 2002, 32, 526–532.150 C. M. Niemeyer, M. Adler and R. Wacker, Nat. Protoc., 2007,

2, 1918–1930.151 R. Fan, O. Vermesh, A. Srivastava, B. K. H. Yen, L. Qin,

H. Ahmad, G. A. Kwong, C.-C. Liu, J. Gould, L. Hood andJ. R. Heath, Nat. Biotechnol., 2008, 26, 1373–1378.

152 Q. Shi, L. Qin, W. Wei, F. Geng, R. Fan, Y. S. Shin, D. Guo,L. Hood, P. S. Mischel and J. R. Heath, Proc. Natl. Acad. Sci.U. S. A., 2012, 109, 419–424.

153 S. Darmanis, R. Y. Nong, J. Vanelid, A. Siegbahn,O. Ericsson, S. Fredriksson, C. Backlin, M. Gut, S. Heath,I. G. Gut, L. Wallentin, M. G. Gustafsson, M. Kamali-Moghaddam and U. Landegren, PLoS One, 2011, 6, e25583.

154 S. Fredriksson, W. Dixon, H. Ji, A. C. Koong, M. Mindrinosand R. W. Davis, Nat. Methods, 2007, 4, 327–329.

155 A. J. Hughes, D. P. Spelke, Z. Xu, C.-C. Kang, D. V. Schafferand A. E. Herr, Nat. Methods, 2014, 11, 749–755.

156 A. F. M. Altelaar, J. Munoz and A. J. R. Heck, Nat. Rev.Genet., 2013, 14, 35–48.

157 T. Nagano, Y. Lubling, T. J. Stevens, S. Schoenfelder,E. Yaffe, W. Dean, E. D. Laue, A. Tanay and P. Fraser,Nature, 2013, 502, 59–64.

158 O. Schwartzman and A. Tanay, Nat. Rev. Genet., 2015, 16,716–726.

159 J. J. Hopfield, Proc. Natl. Acad. Sci. U. S. A., 1974, 71, 4135–4139.160 M. Johansson, J. Zhang and M. Ehrenberg, Proc. Natl. Acad.

Sci. U. S. A., 2012, 109, 131–136.161 N. E. Buchler, U. Gerland and T. Hwa, Proc. Natl. Acad. Sci.

U. S. A., 2003, 100, 5136–5141.162 U. Gerland, J. D. Moroz and T. Hwa, Proc. Natl. Acad. Sci.

U. S. A., 2002, 99, 12015–12020.163 F. M. Weinert, R. C. Brewster, M. Rydenfelt, R. Phillips and

W. K. Kegel, Phys. Rev. Lett., 2014, 113, 258101.164 A. J. Bannister and T. Kouzarides, Cell Res., 2011, 21, 381–395.165 S. I. S. Grewal and D. Moazed, Science, 2003, 301, 798–802.166 T. Kouzarides, Cell, 2007, 128, 693–705.167 F. Andreoli and A. Del Rio, Drug Discovery Today, 2014, 19,

1372–1379.168 W. L. Ku, M. Girvan, G.-C. Yuan, F. Sorrentino and E. Ott,

PLoS One, 2013, 8, e77944.169 T. I. Yusufaly, Y. Li and W. K. Olson, J. Phys. Chem. B, 2013,

117, 16436–16442.170 A. Perez, C. L. Castellazzi, F. Battistini, K. Collinet, O. Flores,

O. Deniz, M. L. Ruiz, D. Torrents, R. Eritja, M. Soler-Lopezand M. Orozco, Biophys. J., 2012, 102, 2140–2148.

171 M. Osella and M. Caselle, Phys. Biol., 2009, 6, 046018.172 W. Ritchie, S. Granjeaud, D. Puthier and D. Gautheret,

PLoS Comput. Biol., 2008, 4, e1000011.173 G. Yeo and C. B. Burge, J. Comput. Biol., 2004, 11, 377–394.174 F. M. Watt and W. T. S. Huck, Nat. Rev. Mol. Cell Biol.,

2013, 14, 467–473.175 F. Guilak, D. M. Cohen, B. T. Estes, J. M. Gimble,

W. Liedtke and C. S. Chen, Cell Stem Cell, 2009, 5, 17–26.176 H. Lv, L. Li, M. Sun, Y. Zhang, L. Chen, Y. Rong and Y. Li,

Stem Cell Res. Ther., 2015, 6, 103.177 S. W. Lane, D. A. Williams and F. M. Watt, Nat. Biotechnol.,

2014, 32, 795–803.178 T. Ideker, T. Galitski and L. Hood, Annu. Rev. Genomics

Hum. Genet., 2001, 2, 343–372.179 H. Kitano, Science, 2002, 295, 1662–1664.180 C. Auffray, Z. Chen and L. Hood, Genome Med., 2009, 1, 2.181 L. Hood and Q. Tian, Genomics, Proteomics Bioinf., 2012,

10, 181–185.182 A. Mardinoglu and J. Nielsen, J. Intern. Med., 2012, 271,

142–154.183 J. Nielsen, J. Intern. Med., 2012, 271, 108–110.184 N. E. Navin, Genome Res., 2015, 25, 1499–1507.185 P. Van Loo and T. Voet, Curr. Opin. Genet. Dev., 2014, 24, 82–91.186 L. Zhang, H. Xiao, H. Zhou, S. Santiago, J. M. Lee,

E. B. Garon, J. Yang, O. Brinkmann, X. Yan, D. Akin,D. Chia, D. Elashoff, N.-H. Park and D. T. W. Wong, Cell.Mol. Life Sci., 2012, 69, 3341–3350.

187 A. A. Powell, A. H. Talasaz, H. Zhang, M. A. Coram,A. Reddy, G. Deng, M. L. Telli, R. H. Advani, R. W.Carlson, J. A. Mollick, S. Sheth, A. W. Kurian, J. M. Ford,F. E. Stockdale, S. R. Quake, R. F. Pease, M. N. Mindrinos,G. Bhanot, S. H. Dairkee, R. W. Davis and S. S. Jeffrey, PLoSOne, 2012, 7, e33788.

188 A. L. Paguirigan, J. Smith, S. Meshinchi, M. Carroll,C. Maley and J. P. Radich, Sci. Transl. Med., 2015, 7, 281re2.

Review Molecular BioSystems

Publ

ishe

d on

19

July

201

6. D

ownl

oade

d by

Cal

ifor

nia

Inst

itute

of

Tec

hnol

ogy

on 1

1/11

/201

6 15

:10:

45.

View Article Online

Page 15: Cellular identity at the single-cell level · Molecular Identifier (UMI) based molecule-counting method at the single-cell level. He is currently working as a postdoctoral research

This journal is©The Royal Society of Chemistry 2016 Mol. BioSyst., 2016, 12, 2965--2979 | 2979

189 B. Treutlein, Q. Y. Lee, J. G. Camp, M. Mall, W. Koh,S. A. M. Shariati, S. Sim, N. F. Neff, J. M. Skotheim,M. Wernig and S. R. Quake, Nature, 2016, 534, 391–395.

190 A. B. C. Cherry and G. Q. Daley, Cell, 2012, 148, 1110–1122.191 D. A. Robinton and G. Q. Daley, Nature, 2012, 481, 295–305.192 D. R. Higgs, N. Engl. J. Med., 2008, 358, 964–966.193 S. A. Goldman, Cell Stem Cell, 2016, 18, 174–188.194 S. S. Dey, L. Kester, B. Spanjaard, M. Bienko and A. van

Oudenaarden, Nat. Biotechnol., 2015, 33, 285–289.195 I. C. Macaulay, W. Haerty, P. Kumar, Y. I. Li, T. X. Hu,

M. J. Teng, M. Goolam, N. Saurat, P. Coupland, L. M. Shirley,M. Smith, N. Van der Aa, R. Banerjee, P. D. Ellis, M. A. Quail,H. P. Swerdlow, M. Zernicka-Goetz, F. J. Livesey, C. P. Pontingand T. Voet, Nat. Methods, 2015, 12, 519–522.

196 S. Darmanis, C. J. Gallant, V. D. Marinescu, M. Niklasson,A. Segerman, G. Flamourakis, S. Fredriksson, E. Assarsson,M. Lundberg, S. Nelander, B. Westermark and U. Landegren,Cell Rep., 2016, 14, 380–389.

197 C. Albayrak, C. A. Jordi, C. Zechner, J. Lin, C. A.Bichsel, M. Khammash and S. Tay, Mol. Cell, 2016, 61,914–924.

198 T. Daley and A. D. Smith, Bioinformatics, 2014, 30,3159–3165.

199 A. A. Pollen, T. J. Nowakowski, J. Shuga, X. Wang,A. A. Leyrat, J. H. Lui, N. Li, L. Szpankowski, B. Fowler,P. Chen, N. Ramalingam, G. Sun, M. Thu, M. Norris,R. Lebofsky, D. Toppani, D. W. Kemp Ii, M. Wong,B. Clerkson, B. N. Jones, S. Wu, L. Knutsson, B. Alvarado,J. Wang, L. S. Weaver, A. P. May, R. C. Jones, M. A. Unger,A. R. Kriegstein and J. A. A. West, Nat. Biotechnol., 2014, 32,1053–1058.

200 D. Sims, I. Sudbery, N. E. Ilott, A. Heger and C. P. Ponting,Nat. Rev. Genet., 2014, 15, 121–132.

201 D. Grun and A. van Oudenaarden, Cell, 2015, 163, 799–810.202 A. M. Streets and Y. Huang, Nat. Biotechnol., 2014, 32,

1005–1006.

Molecular BioSystems Review

Publ

ishe

d on

19

July

201

6. D

ownl

oade

d by

Cal

ifor

nia

Inst

itute

of

Tec

hnol

ogy

on 1

1/11

/201

6 15

:10:

45.

View Article Online