science.sciencemag.org/content/367/6482/1151/suppl/DC1
Supplementary Materials for
Sequencing metabolically labeled transcripts in single cells
reveals mRNA turnover strategies Nico Battich*, Joep Beumer, Buys de Barbanson, Lenno Krenning, Chloé S. Baron,
Marvin E. Tanenbaum, Hans Clevers, Alexander van Oudenaarden*
*Corresponding author. E-mail: [email protected] (N.B.); [email protected] (A.v.O.)
Published 6 March 2020, Science 367, 1151 (2020)
DOI: 10.1126/science.aax3072
This PDF file includes:
Materials and Methods
Supplementary Text
Figs. S1 to S15
Captions for Additional Data Tables S1 to S4
References
Other Supplementary Material for this manuscript includes the following:
(available at science.sciencemag.org/content/367/6482/1151/suppl/DC1)
Additional Data Tables S1 to S4 (Excel files)
1
Materials and Methods
Tissue culture
RPE1-FUCCI cells were cultured in Dulbecco's modified Eagle's medium
(DMEM)/F12, supplemented with GlutaMAX (Gibco), FBS (Gibco), and
penicillin/streptomycin (Gibco), following standard procedures. Similarly, K562 cells
were cultured on RPMI 1640 medium (Gibco), supplemented with FBS, and
penicillin/streptomycin. EU culture and dissociation are described below.
scEU-seq
The scEU-seq protocol was based on CEL-Seq2 (19, 26). Primers for cDNA synthesis
were used at a working solution of 7.5 ng/µl. 50 nl of the primer working solution were
dispensed in wells of the 384 well plate (Greiner) containing 5µl mineral oil (Sigma) using
a mosquito (TTP Labtech), plates were then stored at -80ºC until use.
For the click reaction we used the following reagents: CuSO4 stock solution (200 mM
in water), Ligand stock (THPTA, Lumiprobe, 400 mM in water), 5-ethynyl uridine (EU)
stock solution (0.5 M in DMSO), Azide-PEG3-biotin conjugate (Sigma) stock solution (1
M in DMSO), 1% IGEPAL solution in 50mM TRIS, and Ascorbate (Sigma) solution in
31.6 mg in 1ml of H2O. We first created a master mix (MM) per click reaction by mixing
2.4 µl Ligand stock solution, 2 µl of 30×PBS, and 1.2 µl CuSO4 stock solution. Then we
created Click Solution A with 10.3 µl of Azide Biotin 10 mM (1:100 dilution of 1M stock),
6 µl 1% Triton X100, 1 µl of 0.1 µg/µl DAPI, 30.5 µl H2O, and 5.6 µl MM.
Cells were incubated with the EU and/or DMSO for the desired time (e.g. 60 min).
The final concentration of EU was 200 µM for RPE1-FUCCI chase experiments, 400 µM
for RPE1-FUCCI or K562 pulse experiments, 166 µM for chase experiments in organoids
and 1.6 mM for pulse experiments in Organoids. The U chase phase of the was done with
400 µM U. Cells were dissociated using TrypLE enzyme mix (Gibco) and mechanical
sheering if required. Cell were resuspended in 200 µl PBS and fixed by adding 200 µl of
8% PFA (final fixing medium is 4% PFA in PBS) for 5 in at room temperature (RT), then
200 µl of 1% Triton-X100 and further incubated for 5 min at RT, cells were then
centrifuged at 300g for 5 min and resuspended in 500 µl of 1M Tris-HCL pH7-8 to quench
the fixation reaction. To perform the click reaction, cells were resuspended in 53.4 µl of
Click Solution A after the Tris-HCl was, and 6.6 µl of Ascorbate solution was added and
cells incubated for 30 min at RT. After the click was completed, 540 µl of PBS was added
to the cells and FACS of single cells into wells of a 384-well plates containing primers was
performed immediately. The last column of the plate was generally left empty, to serve as
the empty well control. Sorted plates were stored at -80ºC until further processing.
Prior first strand cDNA synthesis, we reversed fixation of the cells by adding 50 nl
per well of 10 nl (1:50,000 ERCC RNA spike-ins, Thermo), 10 mM dNTPs (promega), 10
nl Proteinase K (Ambion), and 10 nl of 1% IGEPAL (Sigma) solution, an incubating cells
for 30 min at 55 ºC, 80ºC for 10 min, 65ºC for 10 min and then cooled to 4ºC. First strand
synthesis was performed by adding 175 nl of, 35 nl of 5×RT buffer, 17.5 nl of 0.1 M DTT,
8.75 nl Superscript II (Thermo), 5 nl H2O and 8,75 nl RNaseOut (Thermo), and incubating
plates at 42ºC for 60 min, then 70ºC for 15 min and then cooled to 4ºC, plates were then
kept on ice until pooling.
2
At this point plates were pooled in a single 500 µl Eppendorf tube and treated with 1
µl of Exonuclease I (Thermo) for 30 min at 37ºC before proceeding. We then added 67 µl
of activated streptavidin beads in 2 × wash-binding (WB, 10 mM Tris-HCl - pH 7.5, 1 mM
EDTA, 2 M NaCl) buffer (Dynabeads MyOne Streptavidin C1, Thermo, activation done
as specified by manufacturer) and incubated samples at RT for 30 min while shaking. Once
binding the EU labeled mRNA/cDNA hybrid was completed, we separated the beads from
the supernatant. The beads where then washed once with WB buffer at 50ºC for 5min, three
times with W& buffer at RT and once with 200 µl of low salt buffer (0.1 M NaCl in RNase
free water). Streptavidin beads were then resuspended in 20 µl of nuclease free water. The
supernatant was precipitated using 1 × volume of AMPure XP beads (Beckman Coulter),
a 1:4 bead dilution from stock in bead binding buffer was used and bead cleanup performed
as recommended by manufacturer. After clean up AMPure XP bead were resuspended in
20 µl of nuclease free water.
Second strand synthesis was performed for both the labeled and unlabeled
(supernatant) fractions using the NEBNext Ultra II Non-Directional RNA Second Strand
Module (NEB) as specified by manufacturer, then samples were cleaned once more using
the AMPure XP beads, and beads were resuspended in 4.8 µl of nuclease free water. In
vitro transcription was performed over night at 37ºC using the MEGAscript T7
Transcription Kit (Ambion) as specified by manufacturer. Sequencing libraries were
prepared with the TruSeq small RNA primers (Illumina) and then sequenced paired-end at
75 bp read length in the Illumina NextSeq.
The pulse and chase experiments for the RPE1-FUCCI were done in separate
experimental weeks using different batches of cells. The time points for the pulse
experiment we 15min, 30min, 45min, 60min (1h), 120min (2h), 180min (3h) and the
DMSO control. The time points U washout for the chase experiment were 0 min, 60 min
(1h), 120 min (2h), 240 min (4h), 360 min (6h) and the DMSO control. For the intestinal
organoids all time points we preformed the name experimental day, and were 120 min (2h)
and DMSO control for the pulse experiment, and 0 min, 45 min, 360 min (6h) and DMSO
control for the chase experiment. Time points were initiated so that all cells were isolated
and fixed at the same time.
CEL-seq2 and bulk EU-seq
CEL-seq2 libraries were prepared as described in ref. (26), a total of 1,536 cells were
sequenced, of which, 1,065 passed the quality controls of transcript levels (>104 UMIs) and
fluorescence signal. For the bulk EU-seq control experiments we performed the protocol
as described above but scaled the volumes for 500 cells accordingly.
Murine intestinal organoid culture
Primary organoid cultures used in this study were derived from Lgr5DTR-eGFP (22)
and established and grown as described before (20). Briefly, organoids were expanded in
medium termed ENR, consisting of Advanced Dulbecco’s modified Eagle’s medium/F12
(Advanced DMEM) with HEPES (10mM, Sigma), penicillin/streptomycin (1x,
Thermofisher) and Glutamax. Advanced DMEM was supplemented with 1x B27
(Thermofisher), 1 mM N-acetylcysteine (Sigma), 50 ng/ml murine recombinant epidermal
growth factor (PeproTech), R-spondin 1 conditioned medium (5% of final volume) and
Noggin conditioned medium (5% of final volume) to generate ENR medium. To generate
3
conditioned media, HEK293T cells were stably transfected with Rspo1-Fc (gift from
Calvin Kuo, Stanford University) or transient transfection with mouse Noggin-Fc
expression vector and grown for 1 week in Advanced DMEM supplemented with
penicillin/streptomycin, and Glutamax. Organoids were plated in Reduced Growth Factor
Basement Membrane Matrix (BME) Type 2 (Trevigen).
Cell sorting
Cell sorting of fixed human cell lines and mouse intestinal organoids cells was
performed using an INFLUX instrument (BD). Cells were sorted in PBS after click reaction
was completed as described above. Gating in the forward, side scatters, and DAPI channels
were used to discard doublet cells and debris. For RPE1-FUCCI cells indexed
measurements of RFP and GFP signals were recoded but gating on this channels was only
used to discard prominent outliers. The GFP signal of intestinal organoid lines was used to
enrich for cells expressing the Lgr5 stem cell marker (Fig. 3B). For bulk EU-seq
experiments we sorted 500 cells for each chase EU treatment, the G1 gate was set to sort
cells up ~1/3 of the cell cycle progression, the S gate was set from ~1/2 to ~3/4 of the cell
cycle progression, and the G2 gate was set to the last ~1/6 of the cell cycle-progression.
Treatment of K562 cells for the heat shock experiment
Cells were cultured as described above with the difference that cells were incubated
at 37ºC or 42ºC during the 45 min pulse of EU or DMSO, prior to cell fixation. The total
UMI threshold used was 3000 for the unlabeled fractions of cells incubated at 37ºC in EU
(286 cells), 3000 for the unlabeled fractions of cells incubated at 42ºC in EU (277 cells),
1000 for the labeled fractions of cells incubated at 37ºC in EU (208 cells), 1000 for the
labeled fractions of cells incubated at 42ºC in EU (108 cells), 4000 for the unlabeled
fractions of cells incubated at 37ºC in DMSO (230 cells), 4000 for the unlabeled fractions
of cells incubated at 42ºC in DMSO (190 cells). Prior to DESeq analysis, the number of
UMI was down sampled to 1000 in all cases.
Bioinformatics and Statistical Analysis
In the libraries, read one contains cell barcode as well as the UMI information and
read two read contains sequences from transcripts. We mapped read two using STAR 2.5
with default parameters, to the human genome (ensemble release 90 of the homo sapiens
GRCh38 genome, extended with ERCC92 spike-ins) or the mouse genome (ensemble
release 90 of the mus musculus GRCm38 genome, extended with ERCC92 spike-ins). The
design of the primer used for cDNA synthesis was “GCCGG - minimal T7 promoter
(TAATACGACTCACTATAGGG) - A - Illumina adapter
(GTTCTACAGTCCGACGATC) - unique molecular identifier (NNNNNN) - cell barcode
(8 bases) - 24xT - V”. If the length of the poly-T track in read one was less than 19 bases,
the read was discarded. The number of UMI count per gene was obtained by pooling all
reads mapping to the same gene and having the same cell barcode as previously described
(27) using only uniquely mapped reads. Briefly, for each cell barcode, we counted the
number of UMIs for every transcript and aggregated this number across all transcripts
derived from the same gene locus (27). However, we did not use binomial statistics to
convert the number of UMIs to transcript counts (27). The UMI for a given gene in a cell
was considered to be unspliced if at least one base of any read belonging to that UMI
4
mapped outside of the exons of the gene. The minimum number of UMI detected in a cell
for the RPE1-FUCCI cell-cycle analysis was of 2300, while the threshold used for murine
in intestinal organoid cells was of 1000.
For differential gene expression analysis, we used the R package DESeq2 (28). All
reported enrichments were carried out using Fisher’s exact test, and obtained P values were
adjusted for multiple testing using the Benjamini–Hochberg method.
Self-organizing maps (SOMs) for analysis of organoid gene expression and transcript
regulatory strategies were created using the SOM library for python, using the cosine
distance metric to match genes to SOM nodes. The single cell UMAP sand SOM analyses
to identify cell type lineages for intestinal organoids dataset were constructed using the
cosine metric on the normalized spliced UMI counts per cell (relative expression of each
gene per cell, see below), with the UMAP implementation for Python (29, 30). The SOM
analysis initially resulted in 12 nodes but node 12 was merged to node 8 as these two were
clusters of cells mapping to the same region of the UMAP but represented cell coming
from the Pulse and chase experiments respectively. The identification of cell type identity
was done based of differential gene expression analysis using DESeq. Briefly, the
expression of cell in every cluster was compared to cells from all other clusters, and we
used the identity of the top upregulated genes to call cell types (Table S1). GO enrichment
analysis was done using DAVID (31) as described in (32).
Identification of highly variable genes during organoid development
To identify genes that were highly variable during intestinal organoid differentiation,
we first filtered genes using the coefficient of variation (CV) as a function of the mean
expression level. We define the relative expression of each gene by dividing the total
spliced UMI counts for a gene in a cell by the total spliced UMIs detected in that cell across
all genes and multiplying this by 105. We learned the general scaling of the CV and mean
relative expression by linear regression, and selected genes that were in the to 10% most
variable for a given expression level. To obtain genes above this threshold, we used a
sliding window with a width value of 0.2 in the log10(relative expression) (fig. S13B). This
resulted in 1033 putative regulated genes. We then applied DESeq analysis (28) comparing
the relative expression of the nodes 11 and 8, against all other nodes, to find genes that
were significantly different between Lgr5 positive stem cells, and the rest. The adjusted P
threshold was set to 10-5 (fig. S13C). Additionally, to correct for any systematic bias on the
measurement of the log2(fold change), we performed Gaussian mixture modeling assuming
5 Gaussian distribution and discarded all genes that were within 2 standard deviations of
the mean value of the central Gaussian distribution. All ribosomal protein coding genes,
Malat1 (33), and Hck, Hbegf and Ptprr were removed from the final list, resulting in 295
genes that were used for further analysis. We added selected housekeeping genes as
controls (see main text).
Computation of the cell cycle progression
To compute the cell-cycle progression of RPE1-FUCCI cells using the signals from
Geminin-GFP and the Cdt1-RFP, we first depleted the dataset according to the local data
density and then matched the fluorescence measurements from the pulse and chase
experiments by performing a z-score normalization on the log10(fluorescence). Briefly, we
iteratively discarded two points chosen from the 10 points with largest local density until
5
500 data points were left, then the mean and standard deviation (std) values were computed
for the GFP and RFP signals of the pulse and chase experiments. Next, the values of the
chase experiment were matched to the pulse experiment using the following expression:
𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑒𝑑𝐶ℎ𝑎𝑠𝑒 = 𝑚𝑒𝑎𝑛𝑃𝑢𝑙𝑠𝑒 + 𝑠𝑡𝑑𝑃𝑢𝑙𝑠𝑒 ∗ ((𝑜𝑟𝑖𝑔𝑖𝑛𝑎𝑙𝐶ℎ𝑎𝑠𝑒 − 𝑚𝑒𝑎𝑛𝐶ℎ𝑎𝑠𝑒)/𝑠𝑡𝑑𝐶ℎ𝑎𝑠𝑒)
We constructed the cell-cycle progression trajectory by using an implementation of
the wanderlust algorithm (34) written in Python. We changed the algorithm so that cells
were at allocated to one of 300 equally spaced points along the trajectory, to allow for later
calculation of the average time cells spent at each cell-cycle point using the ergodicity
principle. The cell-cycle progression was computed independently for the scEU-seq and
CEL-seq2 datasets. We also used the cell-cycle progression computations to guide the
sorting of the bulk G1, S and G2 control experiments, as described above.
Derivation of organoid differentiation trajectories
To compute the differentiation trajectory for intestinal organoids we used the R
package of Monocle2 (24) based on the total spliced UMI counts for the 295 genes that
significantly varied during differentiation plus the 6 selected housekeeping genes (301
genes in total) (fig. S14A). For computation of the average transcript levels, and synthesis
and degradation rates in the secretory lineage leading to Paneth cell differentiation, we took
cells belonging to clusters 4, 6, 8, 9, 10 and 11 (Fig. 3C), and that were part of branch 1 in
the Monocle2 analysis (Fig. 3E and fig. S14A). Similarly, for computation of the average
transcript levels, and synthesis and degradation rates in the enterocyte lineage, we took
cells belonging to clusters 1, 2, 6, 7, 8 and 11 (Fig. 3C), and that were part of branch 2 in
the Monocle2 analysis (Fig. 3E and fig. S14A). The values were derived by first rescaling
the difference between two adjacent cells in the Monocle2 trajectory by the Manhattan
distance between the relative expression of the 301 selected genes per cell. We then
averaged normalized rates and normalized UMI counts along equally spaced 200 position
of a sliding window with a window size equivalent to 5 positions, in each branch
individually.
Supplementary Text
Generation of simulated data for testing the fitting procedure
To assess the best fitting procedure to estimate κ and γ using our dataset, we
considered two alternative interpretations of the chase experiment, a non-steady state
model, where the number of labeled molecules at the start of the chase time is considered
to be unknown, and a quasi-steady state model, which views the dynamics of the chase
experiment as an exponential decay process. We then built a probabilistic framework to
simulates realistic dynamics of the pulse and chase experiments under non-steady state
conditions.
For the pulse experiment we assume the simplest model of gene expression:
6
Note that this model assumes continuous synthesis of mRNA. The time dependent
probability mass function of such a model follows a Poisson distribution if the initial expression level is zero (35):
𝑃𝑜𝑖𝑠𝑠(𝑁, 𝑚(𝑡)) [1]
where 𝑚(𝑡) = 𝜅
𝛾(1 − 𝑒−𝛾𝑡) corresponds to the dynamics of the population mean, 𝛾
is the degradation rate constant, 𝜅 is the transcription rate, and 𝑁 is the observed number
of molecules at time t. To model the dispersion observed in our single cell sequencing
dataset (27), we convolve eqn. [1] with a negative binomial distribution:
𝑃𝑁(𝑡) = ∑ 𝑃𝑜𝑖𝑠(𝑀, 𝑡) 𝑁𝐵𝑡=0(𝑁, 𝑀 × 𝑝, 𝑠)∞𝑀=0 [2]
where 𝑠 is the dispersion of the negative binomial, and is a parameter that can be
modified to approximate different experimental recovery rates. For simulations presented
in the manuscript p was set to 0.25 and log2 𝑠 = 2 + log2 𝑝 ∗ 𝑀. In fig. S7, A and B, 𝜅 =10 𝑚𝑜𝑙𝑒𝑐𝑢𝑙𝑒𝑠 ∙ ℎ−1, and 𝛾 = 0.346 ℎ−1, fig. S7A displays an example of a pulse
experiment and fig. S7B displays an example of a chase experiment. We use eqn. [2] to
sample the dynamics of the pulse experiment for different combinations of the synthesis
and degradation rates seen in fig. S7.
For simulation of the chase experiment we divided the dynamics in two phases, the
induction phase, where the mRNA molecules are labeled with the dynamics of the pulse
experiment using eqn. [2]. For genes close to steady-state expression, this induction phase
is relatively long (induction window in fig. S7C). However, for genes that have not reached
steady state, which is the case of most genes in the time scale of a single cell cycle, this
induction window will be relatively short, which in turns leads to changes in the total
number of molecules at the start of each chase phase, as discussed above (fig. S7C). We
then simulate the chase phase of the experiment as a stochastic exponential decay, which
is equivalent to a Bernoulli trial, and follows the binomial distribution (36). 𝐵(𝑁, 𝑁0, 𝑝(𝑡)) [3] where 𝑝(𝑡) = 𝑒−𝛾𝑡, 𝑁 is the observed number of molecules at time 𝑡, and 𝑁0 is the
initial number of molecules. We model the dynamics of the observed chase experiment as
the convolution of the equations [2] and [3].
𝐶𝑁(𝑡) = ∑ 𝑃𝑁0(𝑤 − 𝑡) 𝐵(𝑁, 𝑁0, 𝑡)∞
𝑁0=0 [4]
where 𝑁0 is the of molecules at the start of the chase phase, 𝑡 is the chase time, 𝑤 is
the time of the induction phase. We use 𝑃𝑁0(0) for 𝑤 < 𝑡. For results shown in fig. S7, E
to I, we used 𝑤 = 6ℎ, a range for 𝜅 values from 1 to 100 molecules/h, and a range of 𝛾 values from 0.069 to 1.38 h-1, equivalent to half-lives of ~10 to 0.5 h. For results shown in fig. S7, J and K, we used 𝑤 = 2h to 20h as indicated, 𝜅 = 40 molecules/h.
7
Generation of simulated data for testing the validity of the regulatory strategies
To test the validity of the estimated regulatory strategies, we assumed a simple gene
expression model as shown above, and then allowed κ and γ to change dynamically for a
period of 1440 minutes (24 h). The change of the rates over time was based on the Gaussian
function 𝑔(𝑡) ≡ exp [−1
2(
𝑥−𝜇
𝜎)
2
], where the 𝜇 is the peak time which was set to 720
minutes for all simulations, 𝜎 is the standard deviation, which was set to 100, 200, 300 and
400 minutes, and 𝑡 is the time along the cell cycle. The synthesis rate (units molecules/h)
was defined by the function log10 𝜅(𝑡) = 𝑎 + 𝑏 ∗ 𝑔(𝑡), where 𝑎 was 1.5 or 1.2, and 𝑏
ranged between 0.5 and 1.5. Similarly, the degradation rate constant (units ℎ−1) was
defined by the function log10 𝛾(𝑡) = (𝑎 − 1) + 𝑐 ∗ 𝑔(𝑡), where 𝑎 was 1.5 or 1.2, 𝑐 ranged
between −𝑏 and 𝑏. This choice of parameterization leads to a steady state level of
𝜅(𝑡 → ∞)/ 𝛾(𝑡 → ∞) = 10 transcripts. Varying the parameters a and b resulted in
synthesis rates ranging from 15.6 to 1,000 molecules/h and degradation rate constants
ranging from 0.05 to 100 ℎ−1 (equivalent to a range of half-lives between 13.9 and 0.007
h). The parameters b and c define the fold change of the synthesis rate and degradation rate
constant (fig. S12A): 𝐾 ≡𝜅(𝑡=𝜇)
𝜅(𝑡→∞)= 10𝑏 and 𝐺 ≡
𝛾(𝑡=𝜇)
𝛾(𝑡→∞)= 10𝑐. Next, we simulated the
stochastic evolution of the pulse and chase experiments using the Gillespie algorithm,
which we implemented in Python. During simulations the values of the rates where updated
every 10 minutes. For the chase experiment, we first simulated the evolution of a full
system progression (1440 minutes). During the 1440 minutes, we stopped the simulation
at the time corresponding to the start of the chase phase, and then simulated an exponential
decay process with the corresponding degradation rate constants. For simulations of the
pulse experiment the initial transcript count was set to zero, and the values from the
degradation and synthesis rates were set to values corresponding to the EU labeling time-
points. We simulated 100 traces for each rate regime (either cooperative or destabilizing),
and acquired measurements at each time-points corresponding to 0, 130, 261, 392, 523,
654, 784, 915, 1,046, 1,177, 1,308 and 1,439 minutes, of the second run, resulting in a total
of 105,600 simulated traces. To estimate the probability density function of the
experimental measurements, we convoluted the final values of the 100 simulated traces per
condition with a negative binomial distribution as described above, p was set to 0.25 and
log2 𝑠 = 2 + log2 𝑝 ∗ 𝑀. The fitting of the simulated dataset was done as described below
for the non-steady state case.
As can be seen from fig. S12 large errors in the calling of regulatory strategies
(cooperative versus destabilizing) are only observed for cooperative strategies when 𝜎 ≤
100 minutes and 𝑎 = 1.2, and the fold change of the synthesis rate K is relatively large
(higher than 4 times) (fig. S12, E, G and H). Since we do not observe incorrect calls of true
destabilizing strategies as “cooperative”, our findings regarding cooperative strategies
during the cell cycle are robust to biases introduced by the fitting procedure. Furthermore,
these errors are only observed during fast dynamics (expression changes in the order of
100 minutes).
8
General modeling of kinetic rates
To fit the chase experiment assuming a non-steady state dynamic we used the
following arguments and model. The real dynamics of the chase experiment, when the
change in expression unknown can be illustrated by the following figure (fig. S7C), where 𝑚0, 𝑚1, and 𝑚2are the measurement of labeled RNA at chase time point 𝑡0, 𝑡1 and 𝑡2
(washouts in figure), respectively. Then,
𝑚0 =
𝜅
𝛾−
𝜅
𝛾𝑒−𝛾(𝑡0−𝑡1) + ℎ1𝑒−𝛾(𝑡0−𝑡1), where ℎ1 is the unseen initial transcript levels of 𝑡1
𝑚0 = 𝜅
𝛾−
𝜅
𝛾𝑒−𝛾(𝑡0−𝑡2) + ℎ2𝑒−𝛾(𝑡0−𝑡2), …
`𝒎𝟎 = 𝜿
𝜸−
𝜿
𝜸𝒆−𝜸𝒕 + 𝒉𝒕𝒆−𝜸𝒕
𝑚1 = ℎ1𝑒−𝛾(𝑡0−𝑡1),
𝑚2 = ℎ2𝑒−𝛾(𝑡0−𝑡2), …
𝑚(𝑡) = ℎ𝑡𝑒−𝛾𝑡
𝒉𝒕 = 𝒎(𝒕)
𝒆−𝜸𝒕
and,
𝑚(𝑡) = −𝜅
𝛾+
𝜅
𝛾𝑒−𝛾𝑡 + 𝑚0 [5]
where eqn. [5] is the non-steady state dynamics of the chase experiment. The quasi-
steady state interpretation and dynamics of the chase experiment can be represented by fig.
S7D and follows the exponential decay process,
𝑙(𝑡) = 𝑚0𝑒−𝛾𝑡 [6] The dynamics for the pulse experiment is:
𝑙(𝑡) =𝜅
𝛾−
𝜅
𝛾𝑒−𝛾𝑡 + 𝑙0𝑒−𝛾𝑡 [7]
and assuming 𝑙0 = 0, it becomes
𝑙(𝑡) = 𝜅
𝛾(1 − 𝑒−𝛾𝑡) [8]
where 𝑙(𝑡) is average number of molecules detected at pulse or chase time t, and 𝑙0 is
the initial number of molecules in the experiment at time t = 0. 𝑙0, 𝜅 and 𝛾 are fitted
parameters in the chase experiment.
Fitting of the synthesis rate κ degradation rate constant γ
To sample cells from the different pulse and chase time points of the cell cycle
experiment we constructed cell pools for cell-cycle position 𝐶𝐶𝑃𝑖 by pooling cells from 𝑗
9
neighboring positions (𝐶𝐶𝑃𝑖−𝑗 … 𝐶𝐶𝑃𝑖+𝑗), assuming circular cell-cycle structure. To
determine the value of 𝑗 for a given gene, we incrementally expanded it until at least 10%
of the total number of measured UMIs and at least 15 cells of each pulse and chase time
points was within the pool of cells, then for each bootstrap we subsampled the pool of cell
with replacement to draw a total of 30 cells per time point. For the intestinal organoid we
computed one value of κ and γ per cell, for each cell in all time points in the pulse and
chase experiments. The pool of cells to be sampled per cell 𝐶𝑖 was constructed by taking
all closest 20 neighbors of 𝐶𝑖 per pulse and chase time point using the cosine distance
between the total spliced UMIs, taking only the selected genes identified to change
significantly during development plus the 5 housekeeping genes (301 genes in total). For
bootstrapping we sampled 20 cells per time point with replacement. Prior fitting the pulse
and chase experiments we normalized the mean labeled UMI counts of a given gene in the
pool of cells for a given pulse or chase time point by the following equation:
𝑚𝑒𝑎𝑛 𝑠𝑢𝑚 𝑜𝑓 𝑡𝑜𝑡𝑎𝑙 𝑈𝑀𝐼 𝑖𝑛 𝑒𝑥𝑝𝑒𝑟𝑖𝑚𝑒𝑛𝑡 ∙𝑚𝑒𝑎𝑛 𝑙𝑎𝑏𝑒𝑙𝑒𝑑 𝑈𝑀𝐼 𝑓𝑜𝑟 𝑔𝑒𝑛𝑒 𝑖𝑛 𝑝𝑜𝑜𝑙 𝑜𝑓 𝑐𝑒𝑙𝑙𝑠
𝑠𝑢𝑚 𝑜𝑓 𝑡𝑜𝑡𝑎𝑙 𝑈𝑀𝐼 𝑓𝑜𝑟 𝑝𝑜𝑜𝑙 𝑜𝑓 𝑐𝑒𝑙𝑙𝑠 𝑜𝑓 𝑡ℎ𝑒 𝑡𝑖𝑚𝑒 𝑝𝑜𝑖𝑛𝑡 [9]
Where the total UMI is the sum of the unlabeled and labeled UMIs. Since we did not
simulate the entire transcriptome of a cell, for fitting the simulations we used the mean
labels UMI number, and sampled 20 cells per bootstrap per time point. When fitting the
chase experiment assuming a the non-steady state model we used eqn. [5], where 𝑚(𝑡) are
the normalized labeled UMI for each time point q of the chase experiment, the parameters
fitted where 𝜅, γ and 𝑚0, and the minimized the sum or squared errors as the cost function;
𝑒𝑟𝑟𝑜𝑟 = ∑(𝑝𝑟𝑒𝑑_𝑚(𝑡) − 𝑚𝑒𝑎𝑠𝑢𝑟𝑒𝑑_𝑚(𝑡))2. Similarly, when fitting the chase experiment only
assuming the quasi-steady state model we used eqn. [6], and the parameters fitted where γ
and 𝑙0, and the minimized cost function as shown above. For fitting the pulse experiment
only, we used eqn. [8] and fitted 𝜅 and γ, and minimized a cost function as shown above.
Finally, for fitting the non-steady state model on the combination of pulse and chase used
equations [5] and [8]. In the case of the cell cycle we have enough time points to allow us
to fit two independent 𝜅 values for the chase and the pulse experiments. This to account
for the expected differences in library depth and transcript detection efficiencies between
the pulse and chase experiments. For further analysis in this case we used the estimated 𝜅
corresponding to the pulse experiment. In the case of the organoids, this difference was
estimated by a correction factor for the chase experiment 𝑓 =
𝑚𝑒𝑎𝑛 𝑡𝑜𝑡𝑎𝑙 𝑈𝑀𝐼 𝑖𝑛 𝑐ℎ𝑎𝑠𝑒 𝑚𝑒𝑎𝑛 𝑡𝑜𝑡𝑎𝑙 𝑈𝑀𝐼 𝑖𝑛 𝑝𝑢𝑙𝑠𝑒⁄ , which was used to multiply 𝜅 when
applied to eqn. [5]. In the cost function the errors for the pulse and chase experiments, were
weighted for the number of time points in each experiment and for the expression level.
The fitting for each pool of cells was bootstrapped 100 times, for the simulations, 50 times
for each cell-cycle progression point, and 20 times for each cell of the organoid dataset.
The result of the simulations showed that sometimes our fitting procedure generates global
outliers, which had values of rates outside of what we can expect in our biological systems
(fig. S7, E and F). Hence, the thresholds for discarding these global outliers were defined
by inspection of the distribution of all obtained rates, as shown in (fig. S7, E and F, S8D,
and S14D). The values for the threshold used for the cell cycle dataset are 0.00316 and 10
molecules/h for the synthesis rate, and 0.00316 and 10 h-1 for the degradation rate. The
values for the threshold used for the intestinal organoid dataset are 0.000316 and 7.1
molecules/h for synthesis rate and 0.00316 and 7.1 h-1 for the degradation rate. Upon
10
manual inspection of the final rates in the organoids dataset, we adjusted the lower
threshold of the computation of the degradation rate of the Defa17 gene to 0.25 h-1. The
median relative error between the predicted rates and the true rate used for the simulation
was defined as 𝑚𝑒𝑑𝑖𝑎𝑛(|𝑟𝑎𝑡𝑒𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 − 𝑟𝑎𝑡𝑒𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑| 𝑟𝑎𝑡𝑒𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑⁄ ). To approximate
the expected synthesis rate during simulations in order to account for the error introduced
by the negative binomial term in eqn. [2], the true synthesis rate was adjusted by the defined
s parameter of eqn. [2], 𝑠𝑦𝑛𝑡ℎ𝑒𝑖𝑠𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 = 𝑠𝑦𝑛𝑡ℎ𝑒𝑠𝑖𝑠𝑡𝑟𝑢𝑒 ∗ (1 − 𝑠). In the case of the
cell-cycle data set we also discarded clear local outliers by fitting a Gaussian distribution
to the log10 transformed rates around the 𝐶𝐶𝑃 of interest. The Gaussian distribution was
calculated with a window size of 30-60 𝐶𝐶𝑃𝑠 around the point of interest to avoid over
estimation of local outliers. We discarded all points with a deviation higher that 2.5σ. The
median of all non-outlier bootstraps for a given progression point was taken as the measure
rates for that cell-cycle progression point. In the case of the organoids the final rates were
the median of cells in a given Monocle2 trajectory, falling with in a window size equivalent
to 7h and expanded to avoid non-defined values. For trajectory branch 1 only cells in
clusters 1, 3, 8, 10, 11 were considered, and for trajectory branch 2 only clusters 1, 2, 5, 6
and 7 were considered.
Prediction of expression level though cell-cycle progression and organoid differentiation
We estimated the time cells spent on average at each position of the 𝐶𝐶𝑃 or the
differentiation trajectory by following the ergodicity principle. Briefly, we calculated the
fraction of cells that mapped to each position of the 𝐶𝐶𝑃 or differentiation trajectory, and
computed the average time in minutes that cells had to spend at each point, maintaining the
distribution shape and making the cumulative time equal to 24hrs (fig. S9C) for the cell-
cycle, and 72hr for differentiation of organoids (fig S14C) (37). For the organoids we
considered the start of the differentiation of branch 1 to be at the monocle 2 trajectory value
3.5 and progress as the monocle trajectory decreased, for branch 2, we considered the start
point to be 6.5 and increase as the monocle 2 trajectory increased (fig. S14C). For
prediction of the total levels, we recalibrated the computed rates. In the case of the cell-
cycle progression, we first found the scaling factor multiplying the degradation rate
constant γ, required to best predict the gene expression progression along the cell cycle of
the total UMI in the scEU-seq data set using simulations according to eqn. [7], by
initializing the simulations with the average level of the first 20 cell cycle positions, and
calculated the predicted change in expression occurring while cells are at 𝐶𝐶𝑃𝑖 by,
𝑙𝑖 = 𝜅𝑖
𝑐𝛾𝑖−
𝜅𝑖
𝑐𝛾𝑖𝑒−𝑐𝛾𝑖𝑡𝑖 + 𝑙𝑖−1𝑒−𝑐𝛾𝑖𝑡𝑖 [10]
where 𝑖 is a position on the 𝐶𝐶𝑃, 𝑐 is the tested gene correction factor, and 𝜅𝑖 , 𝛾𝑖 ,
and 𝑡𝑖 are the synthesis rate, degradation rate and time spent at each cell-cycle position, the
average recalibrated γ correlate well with known degradation rates in human cells (see fig.
S10C). Similarly, we obtained the correction factor for the recalibration of κ by optimizing
the prediction of the CEL-seq2 experiment to account for the difference in library depth
and transcript detection efficiency. We used the final corrected rate values for further
analysis. In the case, of the organoids to avoid over correction of rates, we only corrected
κ, and the observed expression was computed as the running average of the total UMI
11
detected in the DMSO controls and the 6h (360 min) chase time point for branches 1 and
2. Note that the correction applied to the rates do not change the magnitude of the relative
change, between the two rates. For modeling the cell-cycle or differentiation systems with
constant synthesis rate or degradation rate constants, we averaged the recalibrated rates
weighting the values of each position (in the cell cycle or differentiation) by the expected
number of cells and simulated the resulting gene expression as described above.
Analysis of regulatory strategies
Because we were interested on the relative changes of synthesis with respect to the
degradation rates either during the cell cycle or during intestinal organoid differentiation,
and given that at the steady state RNA levels 𝑙 = 𝜅 𝛾⁄ , we performed the initial analysis
of regulatory strategies using normalized rates rather than the absolute rate values. For the
cell-cycle experiment the rates at each position of 𝐶𝐶𝑃 was normalized to the median rate
along the CCP and then log2 transformed. For the organoids the normalization was done
by the average rate values of the first 50 positions (1/4) of the differentiation trajectory
branch. The cosine similarity between the synthesis and degradation rates were calculated
on the normalized rates per cell-cycle position (RPE1-FUCCI cells) across the entire CCP.
For the cosine similarity of organoids, we used positions 100 to 200 for branch 1 and 50 to
200 for branch 2, to maximize the computation of the similarity during the expected change
in expression. To define clusters enriched in cooperative, neutral or destabilizing strategies
for each strategy cluster in the RPE1-FUCCI dataset, we computed the enrichment of genes
with a low cosine similarity (s<0.5) for as a marker of strong cooperative strategies,
moderate cosine similarities (-0.5<s<0.5) as a marker for neutral strategies, and high cosine
similarities (s>0.5) as a marker for strong destabilizing strategies, in each strategy cluster
using the Fisher test (Fig. S11A). For the organoids, we selected the strategies group of
interest (Fig. 3, I and J) and discarded genes with a cosine similarity between -0.2 and 0.2
to make sure genes with cooperative or destabilizing strategies were taken for further
analysis. The dynamic range was estimated as the difference between the 2% and 98%
percentiles of the mean observed or predicted expression along the cell-cycle progression,
and 1% to 95% percentiles for the differentiation trajectory. The change in dynamic range
was then defined as the 𝑙𝑜𝑔2(𝑚𝑜𝑑𝑒𝑙𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝐶𝑆2𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑⁄ ), for the three different models;
the full dynamic model, the constant synthesis model, and the constant degradation model.
The expression timing was defined by the combination of the length of the detected peak
and the time delay of the peak. We calculated the length of the detected peak by applying
the Otsu threshold detection algorithm to the mean observed signal along cell-cycle
progression or differentiation trajectory. The time delay was obtained my maximizing the
time cross-correlation function of the predicted vs observed expression. The final timing
distance was define as 𝑠𝑖𝑔𝑛(𝑑𝑒𝑙𝑎𝑦) × 𝑠𝑞𝑟𝑡(𝑑𝑒𝑙𝑎𝑦2 + 𝑝𝑒𝑎𝑘 𝑙𝑒𝑛𝑔ℎ𝑡2). The delta value
reported are calculated between the constant synthesis or degradation models and the full
dynamic model for both properties, the timing distance or the change in dynamic range,
i.e. 𝑑𝑒𝑙𝑡𝑎 = 𝑝𝑟𝑜𝑝𝑒𝑟𝑡𝑦𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡𝑀𝑜𝑑𝑒𝑙 − 𝑝𝑟𝑜𝑝𝑒𝑟𝑡𝑦𝑑𝑦𝑛𝑎𝑚𝑖𝑐𝑀𝑜𝑑𝑒𝑙.
Fitting of rates to bulk chase experiments
To fit the bulk chase experiment for G1, S and G2 gated cells we used [5] and [7]. In
this case as before [5] describes the evolution of the levels of labeled transcripts as a
function of the chase time, in addition subtracting [5] to the total levels of transcripts, also
12
describes the evolution of the unlabeled transcripts, and [7] describes the changes in total
transcripts from a one cell-cycle phase to the next given the average time difference
between the two phases. In this case the cos function is 𝑒𝑟𝑟𝑜𝑟 = 𝑒𝑟𝑟𝑜𝑟𝑙𝑎𝑏𝑒𝑙𝑒𝑑 + 𝑒𝑟𝑟𝑜𝑟𝑢𝑛𝑙𝑎𝑏𝑒𝑙𝑒𝑑 +
𝑒𝑟𝑟𝑜𝑟𝑡𝑜𝑡𝑎𝑙. We fitted both biological replicates together. The time the end G2 and G1 was
estimated to be 24min, in this case the initial expression level was the (G2 expression)/2,
The time assumed for G1 to S was estimated to be 876 min, and the time between G2 to be
137 min. The estimation of these times were derived from the above mentioned ergodic
timing of the cell-cycle progression.
.
13
Fig. S1. Technical controls for scEU-seq. (A) Scatter plot showing the effect of EU
treatment in gene expression. The slope represents the mean slope of a linear fit of total
UMI counts, as a function of short incubation times (DMSO, 15, 30, 45 and 60 minutes) in
RPE1-FUCCI cells. The fit was bootstrapped 104 times. P is the probability of the slope
being zero (corrected for multiple testing by the Benjamini-Hochberg method). Genomic
and mitochondrial genes are shown in black and grey, respectively. The strongest EU-
regulated genes are outlined in red. (B) Mean UMI counts of total, labeled or unlabeled
mRNAs derived from RPE1-FUCCI cells treated with EU for 120 min as a function of the
total UMI counts of DMSO-treated cells. Linear fits are shown. Values for the mean slope
± 99% confidence interval (CI) are: 0.995±0.004 for total, 0.905±0.005 for unlabeled, and
0.089±0.007 for labeled mRNAs. (C) Histogram showing the sum of all UMIs detected
per cell in the pulse experiments. The threshold indicates the minimal UMI for a cell to be
taken for further analysis. (D) Histogram showing the sum of all UMIs detected per cell in
the chase experiments. (E) Bar plot showing the percentage of cells above the UMI
threshold of the pulse experiments. (F) Bar plot showing the percentage of cells above the
UMI threshold of the chase experiments. (G) Top panels show the signal to noise ratios
(SNRs) of the labeled UMI counts for all conditions relative to the DMSO controls for cells
14
above the UMI threshold, for the pulse (left) and chase (right) experiments, respectively.
The SNR was calculated as ratio of the median of the total number of labeled UMIs
detected for cells with a given EU treatment, and the median of the total number of labeled
UMIs detected for cells of the DMSO control. Bottom panels show the P values (corrected
Mann-Whitney U tests) for the total number of labeled UMIs detected for cells with a given
EU treatment versus the of the total number of labeled UMIs detected for cells in DMSO
control. The red line indicates P=0.05. Cell numbers of the pulse experiments are n = 265
for DMSO, n = 442 for 15 min, n = 574 for 30 min, n = 564 for 45 min, n = 405 for 60 min,
n = 408 for 120 min, and n = 400 for 180 min. Cell numbers of the chase experiments are
n = 202 for DMSO, n = 460 for 0 min, n = 436 for 60 min, n = 541 for 120 min, n = 391
for 240 min, and n = 334 for 360 min.
15
Fig. S2. scEU-seq performed after heat shock detects enrichment of stress response genes
in the EU-labeled mRNA fraction. (A) Scatter plots showing the average UMI number for
DMSO treated cells (total mRNA, left), and EU-treated K562 cells (unlabeled mRNAs,
middle, labeled mRNAs, right) that had either been incubated for 45 min at 37ºC or at 42ºC.
During the 45 min of heat-shock treatment cells were treated with EU or DMSO. Genes
that were differentially upregulated were detected using DEseq for adjusted P<0.05 and
are indicated in colors. (B) DAVID GO-term annotation enrichment analysis for genes with
upregulated mRNAs detected in the EU-labeled fraction. (C) Bar graph showing the
percentage of all UMIs that represent genes and were found to be upregulated in the EU-
labeled fraction. Enrichment factors (Enr) and P values were calculated using the Fisher
exact test.
16
Fig. S3. Cell cycle progression. (A) Scatter plot of Geminin-GFP and the Cdt1-RFP
corrected signals of RPE1-FUCCI cells (n = 5,422) showing the estimated cell cycle
progression. (B) Scatter plots as in A but showing cells from individual time points from
the pulse and chase experiments, respectively. Time is indicated in minutes. (C) Scatter
plots showing the expression of example genes as a function of the levels of Geminin-GFP
and Cdt1-RFP. Total UMI counts are indicated in color.
17
Fig. S4. Transcript levels of individual genes along the cell cycle progression derived from
EU pulse experiments. (A) UMI counts (color bars) of total transcripts (leftmost panel) and
18
labeled transcripts for each time point of the pulse experiment shown for nine example
genes. (B) Mean labeled UMI counts along the cell cycle trajectory for all detected genes
(n = 11,848) in the pulse experiments.
19
20
Fig. S5. Transcript levels of individual genes along the cell cycle progression derived from
EU chase experiments. (A) UMI counts (color bars) of total transcripts (leftmost panel)
and labeled transcripts for each time point of the chase experiment shown for nine example
genes. (B) Mean labeled UMI counts along the cell cycle trajectory for all detected genes
(n = 11,848) in the chase experiments.
21
Fig. S6. Total transcripts in cells along the cell cycle progression. (A) Median total UMI
counts per cell from the pulse experiments (sum of all genes) as a function of the cell cycle
progression. Gray lines indicate the 25-75% quantiles. Values were calculated with a
sliding window of size = 0.07. (B) As in A but for cells of the chase experiments.
22
Fig. S7. Evaluation of the fitting procedure for κ and γ. (A) Example of simulated pulse
experiments. (B) Example of simulated chase experiments. (C) Schematics of the non-
steady state interpretation of the chase experiment. (D) Schematics of the steady state
interpretation of the chase experiment. (E) Distribution of all estimated synthesis rates and
threshold for global outlier discarding. (F) As in E but for the degradation rates. (G) Heat
maps of the relative median errors for the degradation rate for different combinations of
23
synthesis rate and degradation rates, shown as half-lives (ln(2) /𝛾). using the non-steady
state model, for the pulse experiment (left), the chase experiment (middle), and their
combination (right), respectively. Induction time is 6h. (H) As in G but using the quasi-
steady state model on the chase experiment. (I) As in G but showing the relative median
errors for the synthesis rate. (J) Relative median error for the estimation of the degradation
rate constant as a function of the true transcript half-life. Errors are shown for the non-
steady state model (left) fitting the pulse and chase experiments together, and the quasi-
steady state model (right). Induction window times are indicated in color. (K) Estimated
degradation rate constants as a function of the true degradation rate constant for the non-
steady state model (left) fitting the pulse and chase experiments together, and the quasi-
steady state model (right). Induction window times are indicated in color as in J.
24
Fig. S8. Example of the fitting procedure for the non-steady state model with pulse and
chase experiments combined for the cell cycle dataset. (A) Workflow of the procedure to
fit the synthesis and degradation rate constants along the cell cycle progression. (B)
Example of sampled cells from different phases along the cell cycle progression for PLK1.
25
(C) Fitting of average UMI levels derived from cells sampled from different cell cycle
points for PLK1 (as shown in B). Gray lines represent individual bootstraps, sampling 30
cell with replacement per time point. (D) Representative histograms of fitted rate values
for the synthesis rate (left) and degradation rate constant (right) derived from 50 bootstraps
for 528 selected genes. Gray dashed lines indicate the thresholds for discarding global
outliers. (E) Scatter plots of the fitted synthesis (left) and degradation rates (right) along
the cell cycle progression for PLK1. Local outliers are shown in green and were discarded.
The black line shows the median computed rate at each point along the cell cycle
progression.
26
Fig. S9. Selection of genes for the cell cycle progression analysis. (A) Heat maps showing
the expression levels along the cell cycle progression of all genes that have more than 500
labeled UMIs detected in all time points of the pulse and chase experiments (n = 6,086).
Clusters of genes are indicated on top and black dots indicate the clusters selected for
further analysis. (B) Scatter plot showing the expression peak length and the peak dynamic
range for selected genes (n = 591). Dashed lines indicate the thresholds used for the final
gene selection (black dots, n = 528). (C) The estimated time that cells spend at each point
along the the cell cycle progression for the scEU-seq and the CEL-seq2 datasets.
27
Fig. S10. Predicted degradation rates for the pulse and chase experiment. (A) Clustered
heat map showing the relative predicted degradation rates along the cell cycle progression
obtained by fitting the pulse experiment (left panel), the chase experiment using the non-
steady state model (middle), and the chase experiment using the quasi-steady state model
(right), respectively. Gray bars indicate gene clusters with similar rates and individual
genes are highlighted. (B) Scatter plot of the correlations between the standard deviations
of the estimated synthesis and degradation rates derived from different models as indicated
on the right (n = 528 genes). (C) The correlation between the average computed
degradation rates and the mean rates reported in Schofield et al. (38). (D) Histograms of
the sum of square errors (left), the correction factors for the synthesis rate (middle), and
28
the Pearson correlation between the predicted and observed expression levels (right), for
the non-steady state (pink) and the quasi-steady state (gray) models, respectively. The P
value was calculated from a Wilcoxon test, n = 528 genes.
29
Fig. S11. Gene groups with different regulatory strategies along the cell cycle progression.
(A) Heat map showing the enrichment or depletion for the destabilizing (s >= 0.5), neutral
(-0.5 < s < 0.5), and cooperating (s <= -0.5) strategies for the indicated gene groups
(clusters). Black dots represent significant enrichment or depletion (P<0.05) according to
the Fisher test after correction for multiple testing. s from Fig. 2B (B) Network
visualization of functional GO term annotation enrichments calculated for the indicated
gene groups. Group A example genes: KIF5B, KIF20B, KIF18A. Group B example genes:
CDK1, UBE2C, KIF11, KIF22, TOP2A, CENPE and PLK1. Group C example genes: LIF,
TGFB2, KITLG, POLA1 and MCM6. Group D example genes: PCNA, MCM2, MCM4,
POLD3, LIG1. Group E example genes: BRCA1, MRE11, RMI1. Group F example genes:
VRK1, MELK, PAK1. (C) Scatter plots of the degradation rates derived from gating and
bulk sequencing EU-seq experiments, against average rates derived from scEU-seq
experiments. Plots are shown for three example gene groups and color indicates the cell
cycle phase.
30
Fig. S12. Accuracy of calling the type of regulatory strategy for different regimes of the
synthesis rate and degradation rates constants along the cell cycle. (A) Workflow of the
31
procedure to simulate the pulse and chase experiment through the cell cycle. (B) Simulation
example of the chase experiment. The right panel shows the dynamics of the chase
experiment though a 24 h period, where the measurement time-point is at 13 h. The middle
panel shown the obtained transcript counts at the measurement time-point, and the right
panel shows the probability of measured transcript counts after accounting for sampling
error. (C) As B but for the pulse experiment. (D) Schematics of the sampling space of rate
dynamics, e.g. 1 to 4, corresponds to position of examples given in E. (E) Examples of
different tested true rate regimes and estimated final rates. For a=1.5, the initial synthesis
rate is 31.6 m/h, and the initial degradation rate constant is 21.6 1/h. For a = 1.2, the initial
synthesis rate is 15.8 m/h, and the initial degradation rate constant is 5.8 1/h. (F) Measured
cosine similarities after estimation of rates from the simulated experiments, the expected
cosine similarity for each column is shown at the bottom. 𝜎 and a are indicated. (G)
Histograms of the cosine similarity as a function of 𝜎 and a.
32
Fig. S13. Batch variability of scEU-seq experiments in organoids and selection of genes.
(A) UMAPs of cells belonging to different EU treatments in the organoid dataset; n = 660
for chase 0 min, n = 821 for chase 45 min, n = 646 for chase 360 min, n = 1373 for pulse
120 min and n = 331 for the DMSO control. (B) Scatter plot of the coefficient of variation
(CV) as a function of the mean expression level of all detected genes (n=9,157). Red dots
indicate highly variable preselected genes (n = 1,033). (C) Scatter plot fold changes in
expression against mean expression levels of the preselected 1,033 genes. Genes that show
differential expression are highlighted in red (n=295 genes), gray dots mark genes that
were discarded upon manual inspection. Three marker genes (Apoa1, Lyz1 and Lgr5) are
highlighted, in yellow, green and blue respectively.
33
Fig. S14. Controls for the analysis of the differentiation trajectories of intestinal organoids.
(A) Monocle 2 analysis and derivation of branches 1, 2, and 3. (B) UMAP showing branch
3 of the monocle analysis. (C) The frequency of cells along the monocle 2 trajectory.
Values and cells below the threshold at 3.5 (left dashed line) were used for further analysis
34
and to estimate the differentiation time of branch 1. Similarly, values and cells to the right
of the threshold at 6.5 (right dashed line) were used for further analysis and to estimate the
differentiation time of branch 2. (D) Histograms showing estimated synthesis rate and
degradation rate constants. Gray dashed lines indicate the threshold used to discard global
outliers. (E) Calculated degradation rates (left panels), synthesis rates (middle), and
expression levels (right) along the estimated pseudo time (h) of differentiation for five
example genes of the secretory lineage (branch 1). Red lines represent the median rate or
level used for further analysis. (F) As in E but for five example genes of the enterocyte
lineage (branch 2).
35
Fig. S15. Regulatory strategies of genes during intestinal organoid differentiation. (A) Heat
maps showing the observed (left panels) and predicted (right panels) normalized expression
levels for the differentiation branches 1 and 2, respectively. Genes are clustered according
to their different regulatory strategies and expression levels. Rightmost panels indicate the
r2 for the predicted vs the observed expression in branches 1 and 2. (B) Network
representation of genes of group A, B and D (see Fig. 3) highlighting functional GO term
annotation enrichments. Blue edges link a gene to a strategy group, and gray edges link
genes that share enriched GO term annotations. (C) UMAPs showing the sum of labeled
UMI counts for genes in strategy group A (top panels) or group B (bottom panels) for the
different experimental time points as indicated.
36
Additional Data Table S1
Data related to the regulatory strategy analysis during cell-cycle progression. Related to
Fig. 2.
Additional Data Table S2
Differential gene expression analysis of organoid individual clusters for identification of
cell types in intestinal organoids. Related to in Fig. 3.
Additional Data Table S3
Differential gene expression analysis for identification of genes involved in organoid
differentiation. Related to Fig. 3.
Additional Data Table S4
Data related to the regulatory strategy analysis during organoid differentiation. Related
to Fig. 3
37
References and Notes
1. B. Schwalb, M. Michel, B. Zacher, K. Frühauf, C. Demel, A. Tresch, J. Gagneur, P. Cramer,
TT-seq maps the human transient transcriptome. Science 352, 1225–1228 (2016).
doi:10.1126/science.aad9841 Medline
2. M. Rabani, R. Raychowdhury, M. Jovanovic, M. Rooney, D. J. Stumpo, A. Pauli, N. Hacohen,
A. F. Schier, P. J. Blackshear, N. Friedman, I. Amit, A. Regev, High-resolution
sequencing and modeling identifies distinct dynamic RNA regulatory strategies. Cell 159,
1698–1710 (2014). doi:10.1016/j.cell.2014.11.015 Medline
3. O. Shalem, O. Dahan, M. Levo, M. R. Martinez, I. Furman, E. Segal, Y. Pilpel, Transient
transcriptional responses to stress are generated by opposing effects of mRNA production
and degradation. Mol. Syst. Biol. 4, 223 (2008). doi:10.1038/msb.2008.59 Medline
4. S. C. Little, M. Tikhonov, T. Gregor, Precise developmental gene expression arises from
globally stochastic transcriptional activity. Cell 154, 789–800 (2013).
doi:10.1016/j.cell.2013.07.025 Medline
5. H. Tani, R. Mizutani, K. A. Salam, K. Tano, K. Ijiri, A. Wakamatsu, T. Isogai, Y. Suzuki, N.
Akimitsu, Genome-wide determination of RNA stability reveals hundreds of short-lived
noncoding transcripts in mammals. Genome Res. 22, 947–956 (2012).
doi:10.1101/gr.130559.111 Medline
6. M. Rabani, J. Z. Levin, L. Fan, X. Adiconis, R. Raychowdhury, M. Garber, A. Gnirke, C.
Nusbaum, N. Hacohen, N. Friedman, I. Amit, A. Regev, Metabolic labeling of RNA
uncovers principles of RNA production and degradation dynamics in mammalian cells.
Nat. Biotechnol. 29, 436–442 (2011). doi:10.1038/nbt.1861 Medline
7. A. Raghavan, R. L. Ogilvie, C. Reilly, M. L. Abelson, S. Raghavan, J. Vasdewani, M.
Krathwohl, P. R. Bohjanen, Genome-wide analysis of mRNA decay in resting and
activated primary human T lymphocytes. Nucleic Acids Res. 30, 5529–5538 (2002).
doi:10.1093/nar/gkf682 Medline
8. T. Hashimshony, F. Wagner, N. Sher, I. Yanai, CEL-Seq: Single-cell RNA-Seq by multiplexed
linear amplification. Cell Rep. 2, 666–673 (2012). doi:10.1016/j.celrep.2012.08.003
Medline
9. D. A. Jaitin, E. Kenigsberg, H. Keren-Shaul, N. Elefant, F. Paul, I. Zaretsky, A. Mildner, N.
Cohen, S. Jung, A. Tanay, I. Amit, Massively parallel single-cell RNA-seq for marker-
free decomposition of tissues into cell types. Science 343, 776–779 (2014).
doi:10.1126/science.1247651 Medline
10. A. B. Rosenberg, C. M. Roco, R. A. Muscat, A. Kuchina, P. Sample, Z. Yao, L. T. Graybuck,
D. J. Peeler, S. Mukherjee, W. Chen, S. H. Pun, D. L. Sellers, B. Tasic, G. Seelig, Single-
cell profiling of the developing mouse brain and spinal cord with split-pool barcoding.
Science 360, 176–182 (2018). doi:10.1126/science.aam8999 Medline
11. D. Grün, A. Lyubimova, L. Kester, K. Wiebrands, O. Basak, N. Sasaki, H. Clevers, A. van
Oudenaarden, Single-cell messenger RNA sequencing reveals rare intestinal cell types.
Nature 525, 251–255 (2015). doi:10.1038/nature14966 Medline
38
12. E. Z. Macosko, A. Basu, R. Satija, J. Nemesh, K. Shekhar, M. Goldman, I. Tirosh, A. R.
Bialas, N. Kamitaki, E. M. Martersteck, J. J. Trombetta, D. A. Weitz, J. R. Sanes, A. K.
Shalek, A. Regev, S. A. McCarroll, Highly Parallel Genome-wide Expression Profiling of
Individual Cells Using Nanoliter Droplets. Cell 161, 1202–1214 (2015).
doi:10.1016/j.cell.2015.05.002 Medline
13. A. M. Klein, L. Mazutis, I. Akartuna, N. Tallapragada, A. Veres, V. Li, L. Peshkin, D. A.
Weitz, M. W. Kirschner, Droplet barcoding for single-cell transcriptomics applied to
embryonic stem cells. Cell 161, 1187–1201 (2015). doi:10.1016/j.cell.2015.04.044
Medline
14. B. Pijuan-Sala, J. A. Griffiths, C. Guibentif, T. W. Hiscock, W. Jawaid, F. J. Calero-Nieto, C.
Mulas, X. Ibarra-Soria, R. C. V. Tyser, D. L. L. Ho, W. Reik, S. Srinivas, B. D. Simons,
J. Nichols, J. C. Marioni, B. Göttgens, A single-cell molecular map of mouse gastrulation
and early organogenesis. Nature 566, 490–495 (2019). doi:10.1038/s41586-019-0933-9
Medline
15. See supplementary materials.
16. T. Zerjatke, I. A. Gak, D. Kirova, M. Fuhrmann, K. Daniel, M. Gonciarz, D. Müller, I.
Glauche, J. Mansfeld, Quantitative cell cycle analysis based on an endogenous all-in-one
reporter for cell tracking and classification. Cell Rep. 19, 1953–1966 (2017).
doi:10.1016/j.celrep.2017.05.022 Medline
17. W. da Huang, B. T. Sherman, R. A. Lempicki; W. Huang da, Systematic and integrative
analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4, 44–57
(2009). doi:10.1038/nprot.2008.211 Medline
18. L. Krenning, F. M. Feringa, I. A. Shaltiel, J. van den Berg, R. H. Medema, Transient
activation of p53 in G2 phase is sufficient to induce senescence. Mol. Cell 55, 59–72
(2014). doi:10.1016/j.molcel.2014.05.007 Medline
19. T. Hashimshony, N. Senderovich, G. Avital, A. Klochendler, Y. de Leeuw, L. Anavy, D.
Gennert, S. Li, K. J. Livak, O. Rozenblatt-Rosen, Y. Dor, A. Regev, I. Yanai, CEL-Seq2:
Sensitive highly-multiplexed single-cell RNA-Seq. Genome Biol. 17, 77 (2016).
doi:10.1186/s13059-016-0938-8 Medline
20. T. Sato, R. G. Vries, H. J. Snippert, M. van de Wetering, N. Barker, D. E. Stange, J. H. van
Es, A. Abo, P. Kujala, P. J. Peters, H. Clevers, Single Lgr5 stem cells build crypt-villus
structures in vitro without a mesenchymal niche. Nature 459, 262–265 (2009).
doi:10.1038/nature07935 Medline
21. H. Tian, B. Biehs, S. Warming, K. G. Leong, L. Rangell, O. D. Klein, F. J. de Sauvage, A
reserve stem cell population in small intestine renders Lgr5-positive cells dispensable.
Nature 478, 255–259 (2011). doi:10.1038/nature10408 Medline
22. X. Qiu, Q. Mao, Y. Tang, L. Wang, R. Chawla, H. A. Pliner, C. Trapnell, Reversed graph
embedding resolves complex single-cell trajectories. Nat. Methods 14, 979–982 (2017).
doi:10.1038/nmeth.4402 Medline
23. F. Xie, X. Ding, Q. Y. Zhang, An update on the role of intestinal cytochrome P450 enzymes
in drug disposition. Acta Pharm. Sin. B 6, 374–383 (2016).
doi:10.1016/j.apsb.2016.07.012 Medline
39
24. S. Geula, S. Moshitch-Moshkovitz, D. Dominissini, A. A. F. Mansour, N. Kol, M. Salmon-
Divon, V. Hershkovitz, E. Peer, N. Mor, Y. S. Manor, M. S. Ben-Haim, E. Eyal, S.
Yunger, Y. Pinto, D. A. Jaitin, S. Viukov, Y. Rais, V. Krupalnik, E. Chomsky, M. Zerbib,
I. Maza, Y. Rechavi, R. Massarwa, S. Hanna, I. Amit, E. Y. Levanon, N. Amariglio, N.
Stern-Ginossar, N. Novershtern, G. Rechavi, J. H. Hanna, m6A mRNA methylation
facilitates resolution of naïve pluripotency toward differentiation. Science 347, 1002–
1006 (2015). doi:10.1126/science.1261417 Medline
25. P. J. Batista, B. Molinie, J. Wang, K. Qu, J. Zhang, L. Li, D. M. Bouley, E. Lujan, B. Haddad,
K. Daneshvar, A. C. Carter, R. A. Flynn, C. Zhou, K.-S. Lim, P. Dedon, M. Wernig, A. C.
Mullen, Y. Xing, C. C. Giallourakis, H. Y. Chang, m(6)A RNA modification controls cell
fate transition in mammalian embryonic stem cells. Cell Stem Cell 15, 707–719 (2014).
doi:10.1016/j.stem.2014.09.019 Medline
26. M. J. Muraro, G. Dharmadhikari, D. Grün, N. Groen, T. Dielen, E. Jansen, L. van Gurp, M.
A. Engelse, F. Carlotti, E. J. P. de Koning, A. van Oudenaarden, A single-cell
transcriptome atlas of the human pancreas. Cell Syst. 3, 385–394.e3 (2016).
doi:10.1016/j.cels.2016.09.002 Medline
27. D. Grün, L. Kester, A. van Oudenaarden, Validation of noise models for single-cell
transcriptomics. Nat. Methods 11, 637–640 (2014). doi:10.1038/nmeth.2930 Medline
28. M. I. Love, W. Huber, S. Anders, Moderated estimation of fold change and dispersion for
RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014). doi:10.1186/s13059-014-
0550-8 Medline
29. L. McInnes, J. Healy, J. Melville, UMAP: Uniform Manifold Approximation and Projection
for Dimension Reduction. arXiv:1802.03426 [stat.ML] (9 February 2018).
30. E. Becht, L. McInnes, J. Healy, C. A. Dutertre, I. W. H. Kwok, L. G. Ng, F. Ginhoux, E. W.
Newell, Dimensionality reduction for visualizing single-cell data using UMAP. Nat.
Biotechnol. (2018). Medline
31. B. T. Sherman, W. Huang, Q. Tan, Y. Guo, S. Bour, D. Liu, R. Stephens, M. W. Baseler, H.
C. Lane, R. A. Lempicki, DAVID Knowledgebase: A gene-centered database integrating
heterogeneous gene annotation resources to facilitate high-throughput gene functional
analysis. BMC Bioinformatics 8, 426 (2007). doi:10.1186/1471-2105-8-426 Medline
32. D. Berchtold, N. Battich, L. Pelkmans, A systems-level study reveals regulators of
membrane-less organelles in human cells. Mol. Cell 72, 1035–1049.e5 (2018).
doi:10.1016/j.molcel.2018.10.036 Medline
33. S. C. van den Brink, F. Sage, Á. Vértesy, B. Spanjaard, J. Peterson-Maduro, C. S. Baron, C.
Robin, A. van Oudenaarden, Single-cell sequencing reveals dissociation-induced gene
expression in tissue subpopulations. Nat. Methods 14, 935–936 (2017).
doi:10.1038/nmeth.4437 Medline
34. S. C. Bendall, K. L. Davis, A. D. Amir, M. D. Tadmor, E. F. Simonds, T. J. Chen, D. K.
Shenfeld, G. P. Nolan, D. Pe’er, Single-cell trajectory detection uncovers progression and
regulatory coordination in human B cell development. Cell 157, 714–725 (2014).
doi:10.1016/j.cell.2014.04.005 Medline
40
35. V. Shahrezaei, P. S. Swain, Analytical distributions for stochastic gene expression. Proc.
Natl. Acad. Sci. U.S.A. 105, 17256–17261 (2008). doi:10.1073/pnas.0803850105 Medline
36. W. Sun, Q. Gao, B. Schaefke, Y. Hu, W. Chen, Pervasive allele-specific regulation on RNA
decay in hybrid mice. Life Sci. Alliance 1, e201800052 (2018).
doi:10.26508/lsa.201800052 Medline
37. H. Gehart, J. H. van Es, K. Hamer, J. Beumer, K. Kretzschmar, J. F. Dekkers, A. Rios, H.
Clevers, Identification of enteroendocrine regulators by real-time single-cell
differentiation mapping. Cell 176, 1158–1173.e16 (2019). doi:10.1016/j.cell.2018.12.029
Medline
38. J. A. Schofield, E. E. Duffy, L. Kiefer, M. C. Sullivan, M. D. Simon, TimeLapse-seq: Adding
a temporal dimension to RNA sequencing through nucleoside recoding. Nat. Methods 15,
221–225 (2018). doi:10.1038/nmeth.4582 Medline