uncovering the cannabis sativa methylome through enzymatic

1
Brittany S. Sexton 1 , Louise Williams 1 , V.K. Chaithanya Ponnaluri 1 , Keerthana Krishnan 1 , Luo Sun 1 , Yvonne Helbert 2 , Lei Zhang 2 , Christine J. Sumner 1 , Fiona J. Stewart 1 , Bradley W. Langhorst 1 , Timothy Harkins 2 , Kevin J. McKernan 2 , Eileen T. Dimalanta 1 , Theodore B. Davis 1 1 New England Biolabs, Inc., Ipswich, MA 01938 2 Medicinal Genomics Corp., Woburn, MA 01801 Uncovering the Cannabis sativa methylome through Enzymatic Methyl-seq INTRODUCTION CONCLUSIONS METHODS RESULTS REFERENCES Cannabis sativa is an industrial crop producing more economic value than the top five crops in California combined. In the medical field, cannabinoids are used to treat and alleviate symptoms for a wide range of diseases, including nausea in cancer patients undergoing chemotherapy. Sex determination in cannabis is an important economic factor. Pollination dramatically reduces cannabinoid expression in female plants. To express high levels of cannabinoids, female plants are grown in isolation of male pollen. However, female plants are capable of stress induced hermaphroditism that can lead to acres of pollination and low cannabinoid yields. 5-methylcytosine (5mC) is an important epigenetic mechanism that regulates many cellular processes, including sex determination. To better understand the hermaphroditic process in cannabi s, more robust 5mC surveying tools are required. The current gold standard for determining 5mC, whole genome bisulfite sequencing (WGBS), introduces DNA damage and GC- biased sequences resulting in skewed methylation profiles. To further complicate matters, the cannabis genome is 66% AT-rich and 83% AT-rich after bisulfite conversion. Very few technologies currently address the complexity of plant methylation signals, and to date no methylome exists for cannabis. NEBNext Enzymatic Methyl-seq (EM-seq™) is a novel enzymatic method used to detect 5mC. EM-seq successfully identified the 5mCs within the cannabis genome. Libraries were generated using genomic DNA extracted from female leaf for EM-seq and WGBS. To give insight into the methylation-based regulation of the hermaphroditic process, EM-seq libraries were also prepared using genomic DNA from female flowers, female seeded flowers, and male flowers. We conclude that EM-seq is superior to WGBS and delivers higher library yields, more accurate methylation information, reduced DNA damage, increased sequencing length, and decreased GC-bias. EM- seq enables the study of the previously unknown cannabis methylome and opens up the possibility to discover critical and exciting regulatory functions for the hermaphroditic process. PCR Amplification APOBEC Ultra II End Prep/ Ligation Sheared Genomic DNA TET2/Oxidation Enhancer Illumina Sequencing Jamaican Lion genomic DNA was extracted from female clones (leaf, seeded and unseeded flowers) and male sibling (flowers) plants. 50 ng of genomic DNA, spiked with control (unmethylated lambda DNA and CpG methylated pUC19) were sheared using the Covaris ® S2 instrument Sheared DNA was end-repaired then ligated to modified sequencing adaptors 5mC and 5hmC were protected from APOBEC deamination by TET2/Oxidation Enhancer Cytosines were deaminated to uracils with APOBEC Libraries were amplified using NEBNext ® Q5U TM Master Mix and NEBNext Multiplex Oligos for Illumina (96 Unique Dual Index Primer Pairs, E6440S) Libraries were sequenced using an Illumina Nextseq 500, 2x76 base paired reads. Bisulfite conversion was performed using Zymo Research EZ DNA Methylation-Gold TM kit SAMPLE PREPARATION DATA ANALYSIS Methyl Kit MethylDackel Methyl Extraction Align (BWAmeth) Samblaster Deduplicate Picard Reads were aligned to Jamaican Lion reference genome (August 2018 assembly) and analyzed using the tools in above flowchart. Four contigs from this assembly showed anomalous methylation patterns and were excluded. 8 PCR cycles 6 PCR cycles HIGHER QUALITY SEQUENCING METRICS WITH EM-SEQ COMPARED TO WGBS EM-seq libraries outperform WGBS for Cannabis sativa input DNA as shown in figures A-E: (A) Less PCR cycles are required (6 vs 8 cycles respectively) for EM-seq. (B) EM-seq libraries have larger library insert sizes than WGBS. Additionally, the EM-seq protocol has been optimized for both standard and large insert libraries. (C) EM-seq libraries have lower duplication percentages. (D) GC distribution is more even for EM-seq than WGBS. (E) More even base coverage for EM- seq than WGBS. 50 million, 2 x 76 base reads were used for analysis (Figures A-D) and 130 million 2 x 76 base reads were used for Figure E. C A E Coverage Depth % of Bases at this Depth Insert Size (bp) B PCR Yield (nM) % Duplication 20% 16% 12% 8% 4% 0% 50 100 150 200 250 300 350 400 450 500 550 600 650 700 % of Reads EM-seq (standard) EM-seq (large insert) WGBS 50 100 150 200 250 300 350 400 450 500 550 600 650 700 Normalized Coverage Percent GC 0 10 20 30 40 50 60 70 80 90 100 5 10 15 20 25 30 EM-seq-1 EM-seq-2 WGBS-1 WGBS-2 EM-seq-1 EM-seq-2 WGBS-1 WGBS-2 EM-seq WGBS 10 0 4 2 6 8 1 0 1.0 0.2 0.4 0.6 0.8 1.2 1.4 1.6 1.8 2.0 2.2 25 0 5 10 15 20 0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 WGBS CANNABIS EM-SEQ LIBRARIES ARE SUPERIOR TO WGBS DIFFERENTIAL CpG METHYLATION IDENTIFIED BETWEEN CANNABIS FLOWER TISSUES A A EM-seq enables the identification of differential methylation across Cannabis sativa flower tissues: female flower and female seeded flower (clones) and the male flower (sibling). (A) % 5mC levels in the CpG, CHG, and CHH contexts for all flower EM-seq libraries. The female methylation levels for flower and seeded flower were higher than the male flower indicating methylation patterns are potentially determined by sex. Unmethylated Lambda control was <0.5% methylated and pUC19 was >97.5% CpG methylated (data not shown). (B) Volcano plot of the significant (q<0.01) differential methylation calls for the CpG context between (1) female flower and female seeded flower (2) female flower and male flower. The differential methylation calls for female flower and female seeded flower identified 30 hypomethylated CpGs but no hypermethylated CpGs. The differential methylation calls for female and male flower identified >50,000 hypomethylated CpGs and >11,000 hypermethylated. (C) Comparison of the hypomethylated CpGs with differential methylation between the female flower samples & female and male flowers. 5 0 5 CpG CHG CHH CpG CHG CHH CpG CHG CHH B CpG CHG CHH CpG CHG CHH CpG CHG CHH CpG CHG CHH Total % 5mC EM-seq Female Leaf WGBS Female Leaf Female Flower-1 Female Flower-2 Female Seeded Flower-1 Female Seeded Flower-2 Male Flower-1 Male Flower-2 Methylated Unmethylated Genomic Location (504 bp) Methylated Unmethylated Differential methylation analysis comparing female flower and seeded flower (clones) and the male flower (sibling) were performed. Significant differential methylation calls (q<0.01) for CpG context (1x coverage minimum) between female flower with female seeded flower and/or male flower were identified. (A) Region 1 BLASTs to the edestin gene family of the Cannabis sativa genome. These genes are linked to the positive regulation of seed production. The female flower is CpG methylated in this region, suggesting the expression of the genes are turned off, while the female seeded flower is unmethylated at this CpG. (B) Region 2 BLASTs to the THC acid synthase (THCA) gene of the Cannabis sativa genome and this gene is linked to the positive regulation of THCA production. Interestingly, the female flower is CpG unmethylated in this region, suggesting the expression of the gene is turned on, while the male flower is methylated at this CpG, suggesting the expression of the gene is turned off. Male flowers produce log scale less THCA. (A) Comparison of total % 5mC determined using LC-MS, EM-seq and WGBS for female leaf. The % 5mC for LC-MS is determined as a percentage of total cytosines ((5mC/(total C))x100). 5mC percentages for EM- seq and WGBS were determined by combining 5mC in the 3 methylated cytosine contexts (CpG. CHH, CHG). EM-seq cytosine methylation numbers are closer to LCMS methylation values than WGBS. (B) % 5mC levels in the CpG, CHG, and CHH contexts for EM-seq and WGBS for female leaf. Unmethylated Lambda control was <0.5% methylated for EM-seq and <2% for WGBS and was >97.5% CpG methylated for both EM-seq and WGBS (data not shown). (C) CpG context correlations of EM-seq and WGBS libraries (1x minimum coverage). Both EM-seq and WGBS libraries were highly correlated between replicates and methods (CHG and CHH context data not shown). 50 million, 2 x76 base reads were used for methylation analysis. q-value Methylation Difference Hypo Hyper Female & Female Seeded Flower Female & Male Flower A B C B C METHYLATION PROFILES IDENTIFIED GENES INVOLVED IN SEED AND CANNABINOID PRODUCTION Genomic Location (992 bp) Female Flower-1 Female Flower-2 Female Seeded Flower-1 Female Seeded Flower-2 Male Flower-1 Male Flower-2 100 50 0 50 100 0 5 0 5 0 100 50 0 50 100 0.0100 0.0075 0.0050 0.0025 0.0000 -100 0 100 Methylation Difference -100 0 100 0.0100 0.0075 0.0050 0.0025 0.0000 100 75 50 25 0 % 5mC per C context 40 30 20 10 0 EM-seq LC-MS 100 75 50 25 0 % 5mC per C context Female Flower Female Seeded Flower Male Flower q-value Female & Female Seeded Flower Female & Male Flower RESULTS EM-seq enables the first in depth analysis of the Cannabis sativa methylome and differential methylation identified genes involved in seed production and THCA production. EM-seq libraries compared to WGBS libraries had: Higher Library Yields with less PCR cycles Lower Percent Duplication More Even Base Coverage Larger Library insert Sizes Less GC Bias Similar Percentage Methylation as LC-MS 1. Faust, G. G., and I. M. Hall. “SAMBLASTER: Fast Duplicate Marking and Structural Variant Read Extraction.” 2. Pedersen et al. (2014) “Fast and Accurate Alignment of Long Bisulfite-Seq Reads.” 3. Ryan D, Pederson, B “MethylDackel” 4. Broad Institute. (2016). Picard tools. D EM-seq WGBS EM-seq WGBS UUGTCGGAUUGC TTGTCGGATTGC Converted: Sequenced: CCGTCGGACCGC m hm CCGTCGGACCGC ca/g UUGTCGGAUUGC TET2/Oxidation Enhancer APOBEC TTGTCGGATTGC ca/g CCGTCGGACCGC m hm

Upload: others

Post on 04-Oct-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Uncovering the Cannabis sativa methylome through Enzymatic

Brittany S. Sexton1, Louise Williams1, V.K. Chaithanya Ponnaluri1, Keerthana Krishnan1, Luo Sun1, Yvonne Helbert2, Lei Zhang2, Christine J. Sumner1, Fiona J. Stewart1, Bradley W. Langhorst1,

Timothy Harkins2, Kevin J. McKernan2, Eileen T. Dimalanta1, Theodore B. Davis11New England Biolabs, Inc., Ipswich, MA 01938 2Medicinal Genomics Corp., Woburn, MA 01801

Uncovering the Cannabis sativa methylome through Enzymatic Methyl-seq

INTRODUCTION

CONCLUSIONS

METHODS

RESULTS

REFERENCES

Cannabis sativa is an industrial crop producing more economic value than the top five crops inCalifornia combined. In the medical field, cannabinoids are used to treat and alleviate symptomsfor a wide range of diseases, including nausea in cancer patients undergoing chemotherapy. Sexdetermination in cannabis is an important economic factor. Pollination dramatically reducescannabinoid expression in female plants. To express high levels of cannabinoids, female plantsare grown in isolation of male pollen. However, female plants are capable of stress inducedhermaphroditism that can lead to acres of pollination and low cannabinoid yields.

5-methylcytosine (5mC) is an important epigenetic mechanism that regulates many cellularprocesses, including sex determination. To better understand the hermaphroditic process incannabis, more robust 5mC surveying tools are required. The current gold standard fordetermining 5mC, whole genome bisulfite sequencing (WGBS), introduces DNA damage and GC-biased sequences resulting in skewed methylation profiles. To further complicate matters, thecannabis genome is 66% AT-rich and 83% AT-rich after bisulfite conversion. Very few technologiescurrently address the complexity of plant methylation signals, and to date no methylome exists forcannabis.

NEBNext Enzymatic Methyl-seq (EM-seq™) is a novel enzymatic method used to detect 5mC.EM-seq successfully identified the 5mCs within the cannabis genome. Libraries were generatedusing genomic DNA extracted from female leaf for EM-seq and WGBS. To give insight into themethylation-based regulation of the hermaphroditic process, EM-seq libraries were also preparedusing genomic DNA from female flowers, female seeded flowers, and male flowers. We concludethat EM-seq is superior to WGBS and delivers higher library yields, more accurate methylationinformation, reduced DNA damage, increased sequencing length, and decreased GC-bias. EM-seq enables the study of the previously unknown cannabis methylome and opens up thepossibility to discover critical and exciting regulatory functions for the hermaphroditic process.

PCRAmplificationAPOBEC

Ultra II End Prep/

Ligation

ShearedGenomic

DNATET2/Oxidation

EnhancerIllumina

Sequencing

• Jamaican Lion genomic DNA was extracted from femaleclones (leaf, seeded and unseeded flowers) and male sibling(flowers) plants.

• 50 ng of genomic DNA, spiked with control (unmethylatedlambda DNA and CpG methylated pUC19) were shearedusing the Covaris® S2 instrument

• Sheared DNA was end-repaired then ligated to modifiedsequencing adaptors

• 5mC and 5hmC were protected from APOBEC deamination by TET2/Oxidation Enhancer

• Cytosines were deaminated to uracils with APOBEC• Libraries were amplified using NEBNext® Q5UTM Master Mix

and NEBNext Multiplex Oligos for Illumina (96 Unique DualIndex Primer Pairs, E6440S)

• Libraries were sequenced using an Illumina Nextseq 500,2x76 base paired reads.

• Bisulfite conversion was performed using Zymo Research EZDNA Methylation-GoldTM kit

SAMPLE PREPARATION

DATA ANALYSIS

Methyl KitMethylDackel

Methyl Extraction

Align(BWAmeth)

SamblasterDeduplicate Picard

• Reads were aligned to Jamaican Lion reference genome(August 2018 assembly) and analyzed using the tools inabove flowchart.

• Four contigs from this assembly showed anomalousmethylation patterns and were excluded.

8 PCR cycles6 PCR cycles

0

0.5

1

1.5

2

2.5

3

3.5

Female Leaf EM-seq-1 Female Leaf EM-seq-2 Female Leaf WGBS-1 Female Leaf WGBS-2

Perc

ent D

uplic

atio

n

0

5

10

15

20

25

Female Leaf EM-seq-1 Female Leaf EM-seq-2 Female Leaf WGBS-1 Female Leaf WGBS-2

Yiel

d (n

M)

HIGHER QUALITY SEQUENCING METRICS WITH EM-SEQ COMPARED TO WGBS

EM-seq libraries outperform WGBS forCannabis sativa input DNA as shown infigures A-E: (A) Less PCR cycles are required(6 vs 8 cycles respectively) for EM-seq. (B)EM-seq libraries have larger library insertsizes than WGBS. Additionally, the EM-seqprotocol has been optimized for both standardand large insert libraries. (C) EM-seq librarieshave lower duplication percentages. (D) GCdistribution is more even for EM-seq thanWGBS. (E) More even base coverage for EM-seq than WGBS. 50 million, 2 x 76 basereads were used for analysis (Figures A-D)and 130 million 2 x 76 base reads were usedfor Figure E.

C

A

E

Coverage Depth

% o

f Bas

es a

t thi

s De

pth

Insert Size (bp)

B

PCR

Yiel

d (n

M)

% D

uplic

atio

n

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Coverage Depth

Proje

ct

Medic

inalG

enom

ics_Leaf_

WG

BS

_E

Mseq_C

om

bin

edB

am

s

0.0%

1.0%

2.0%

3.0%

4.0%

5.0%

6.0%

7.0%

8.0%

9.0%

10.0%

Avg. F

rac B

ases

20%

16%

12%

8%

4%

0%50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800

Insert Size

JL

0.00%

0.02%

0.04%

0.06%

0.08%

0.10%

0.12%

0.14%

0.16%

0.18%

0.20%

0.22%

Avg. %

of in

sert

s

% o

f Rea

ds

EM-seq (standard)EM-seq (large insert)WGBS

50 100 150 200 250 300 350 400 450 500 550 600 650 700

Norm

aliz

ed C

over

age

Percent GC

0 10 20 30 40 50 60 70 80 90 100

5 10 15 20 25 30

EM-seq-1 EM-seq-2 WGBS-1 WGBS-2

EM-seq-1 EM-seq-2 WGBS-1 WGBS-2

EM-seq WGBS10

0

4

2

6

8

1

0

1.0

0.2

0.4

0.6

0.8

1.2

1.4

1.6

1.8

2.0

2.2

25

0

5

10

15

20

0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

WGBS

CANNABIS EM-SEQ LIBRARIES ARE SUPERIOR TO WGBS

DIFFERENTIAL CpG METHYLATION IDENTIFIED BETWEEN CANNABIS FLOWER TISSUES

A

A

EM-seq enables the identification of differential methylation across Cannabis sativa flower tissues: femaleflower and female seeded flower (clones) and the male flower (sibling). (A) % 5mC levels in the CpG, CHG,and CHH contexts for all flower EM-seq libraries. The female methylation levels for flower and seededflower were higher than the male flower indicating methylation patterns are potentially determined by sex.Unmethylated Lambda control was <0.5% methylated and pUC19 was >97.5% CpG methylated (data notshown). (B) Volcano plot of the significant (q<0.01) differential methylation calls for the CpG contextbetween (1) female flower and female seeded flower (2) female flower and male flower. The differentialmethylation calls for female flower and female seeded flower identified 30 hypomethylated CpGs but nohypermethylated CpGs. The differential methylation calls for female and male flower identified >50,000hypomethylated CpGs and >11,000 hypermethylated. (C) Comparison of the hypomethylated CpGs withdifferential methylation between the female flower samples & female and male flowers.

0

25

50

75

100

CpG CHG CHH CpG CHG CHH CpG CHG CHH

0

10

20

30

40

LC-MS Female Leaf EM-seq Female Leaf WGBS Female Leaf

B

0

25

50

75

100

CpG CHG CHH CpG CHG CHH CpG CHG CHH CpG CHG CHH

Tota

l % 5

mC

EM-seq Female Leaf WGBS Female Leaf

Female Flower-1

Female Flower-2

Female Seeded Flower-1

Female Seeded Flower-2

Male Flower-1

Male Flower-2

MethylatedUnmethylated

Genomic Location (504 bp)

MethylatedUnmethylated

Differential methylation analysis comparing female flower and seeded flower (clones) and the male flower(sibling) were performed. Significant differential methylation calls (q<0.01) for CpG context (1x coverageminimum) between female flower with female seeded flower and/or male flower were identified. (A)Region 1 BLASTs to the edestin gene family of the Cannabis sativa genome. These genes are linked tothe positive regulation of seed production. The female flower is CpG methylated in this region, suggestingthe expression of the genes are turned off, while the female seeded flower is unmethylated at this CpG.(B) Region 2 BLASTs to the THC acid synthase (THCA) gene of the Cannabis sativa genome and thisgene is linked to the positive regulation of THCA production. Interestingly, the female flower is CpGunmethylated in this region, suggesting the expression of the gene is turned on, while the male flower ismethylated at this CpG, suggesting the expression of the gene is turned off. Male flowers produce logscale less THCA.

(A) Comparison of total % 5mC determined using LC-MS, EM-seq and WGBS for female leaf. The % 5mCfor LC-MS is determined as a percentage of total cytosines ((5mC/(total C))x100). 5mC percentages for EM-seq and WGBS were determined by combining 5mC in the 3 methylated cytosine contexts (CpG. CHH,CHG). EM-seq cytosine methylation numbers are closer to LCMS methylation values than WGBS. (B) %5mC levels in the CpG, CHG, and CHH contexts for EM-seq and WGBS for female leaf. UnmethylatedLambda control was <0.5% methylated for EM-seq and <2% for WGBS and was >97.5% CpG methylatedfor both EM-seq and WGBS (data not shown). (C) CpG context correlations of EM-seq and WGBS libraries(1x minimum coverage). Both EM-seq and WGBS libraries were highly correlated between replicates andmethods (CHG and CHH context data not shown). 50 million, 2 x76 base reads were used for methylationanalysis.

q-va

lue

Methylation Difference

Hypo Hyper

Female & Female Seeded Flower Female & Male Flower

A B C

B

C

METHYLATION PROFILES IDENTIFIED GENES INVOLVED IN SEED AND CANNABINOID PRODUCTION

Genomic Location (992 bp)

Female Flower-1

Female Flower-2

Female Seeded Flower-1

Female Seeded Flower-2

Male Flower-1

Male Flower-2

0.0000

0.0025

0.0050

0.0075

0.0100

−100 −50 0 50 100meth.diff

qvalue factor(factor)hypo

0.0000

0.0025

0.0050

0.0075

0.0100

−100 −50 0 50 100meth.diff

qvalue

factor(factor)hypo

hyper

0.0100

0.0075

0.0050

0.0025

0.0000

-100 0 100Methylation Difference

-100 0 100

0.0100

0.0075

0.0050

0.0025

0.0000

100

75

50

25

0

% 5

mC

per C

con

text

40

30

20

10

0EM-seqLC-MS

100

75

50

25

0

% 5

mC

per C

con

text

Female Flower

Female Seeded Flower

Male Flower

q-va

lue

Female & Female Seeded Flower Female & Male Flower

RESULTS

EM-seq enables the first in depth analysis of the Cannabis sativa methylome anddifferential methylation identified genes involved in seed production and THCA production.

EM-seq libraries compared to WGBS libraries had:• Higher Library Yields with less PCR cycles• Lower Percent Duplication• More Even Base Coverage

• Larger Library insert Sizes• Less GC Bias• Similar Percentage Methylation as LC-MS

1. Faust, G. G., and I. M. Hall. “SAMBLASTER: Fast Duplicate Marking and Structural Variant ReadExtraction.”2. Pedersen et al. (2014) “Fast and Accurate Alignment of Long Bisulfite-Seq Reads.”3. Ryan D, Pederson, B “MethylDackel”4. Broad Institute. (2016). Picard tools.

DEM-seq WGBS

EM-seq WGBS

UUGTCGGAUUGC

TTGTCGGATTGC

Converted:

Sequenced:

CCGTCGGACCGC

mhm

CCGTCGGACCGC

ca/g

UUGTCGGAUUGC

TET2/Oxidation Enhancer

APOBEC

TTGTCGGATTGC

ca/g

CCGTCGGACCGC

mhm