dr. metsada pasmanik-chor, bioinformatics unit, tel aviv university 1 subpsec (substitution...
TRANSCRIPT
Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University1
subPSEC (substitution position-specific evolutionary conservation) estimates the likelihood of a functional effect. Values are 0 to -10, (-10 most likely to be deleterious). -3 is the previously identified cutoff point for functional significance.
Pdeleterious
(anything above
0.5 is considered
deleterious)
substitution
-3.968430.72481D538G
EVOLUTIONARY ANALYSIS OF CODING SNPS
http://www.pantherdb.org/tools/csnpScoreForm.jsp?
ESR1_HUMAN: D538G
Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University2http://mutationassessor.org/
ESR1_HUMAN: D538G
• 11 possible candidate SNPs were selected for their potential relevance to breast cancer.
• rs2747648, which resides in a predicted binding site for 3 miRNAs in the estrogen receptor-α (ESR1) gene, was associated with a 27% reduction in breast cancer risk in premenopausal women.
• When the C allele is present, miR-453 binds with greater affinity to ESR1, thus leading to decreased levels of ERα protein. Postmenopausal women already have reduced levels of endogenous estrogen, perhaps explaining why this SNP is relevant only in premenopausal women.
• Would carriers of the ancestral T allele respond better to endocrine therapy ? given that they will naturally express increased levels of the receptor.
References:Tchatchou, S. et al. A variant affecting a putative miRNA target site in estrogen receptor (ESR) 1 is associated with breast cancer risk in
premenopausal women. Carcinogenesis 30, 59–64 (2009).Adams, B. D., Furneaux, H. & White, B. A. The micro-ribonucleic acid (miRNA) miR-206 targets the human estrogen receptor- α (ERα) and
represses ERα messenger RNA and protein expression in breast cancer cell lines. Mol. Endocrinol. 21, 1132–1147 (2007).
Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University3
SNPs in miRNA Binding Sites
http://www.genemania.org/
Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University4
Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University5
Before you design your own primers – Don’t reinvent the wheels!
Essential Bioinformatics Resources for Designing PCR Primers for Various Applications: http://www.humgen.nl/primer_design.html
1. Use NCBI Gene or UCSC genome browser to find gene variants:
• Transcript variants• Alternative isoforms• Exon-intron boundaries • Pseusogenes
2. Gene conservation considerations
3. SNPs-There are approximately 56 million SNPs in the human genome, 16 million are in gene introns and exons, most are silent mutations. Are we
aiming at these locations ?
jPCR: http://primerdigital.com/tools/soft.html Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University6
Basic considerations before designing primers
Primer length determines the specificity and affects annealing to the template:Short primer => low specificity, non-specific amplificationLong primer => decreased binding efficiency at normal annealing temperature (due to high probability of forming secondary structures such as hairpins).
Primer design and primer characteristics
• Primer length: 18-24 bps, complete sequence identity to template• G/C content: 40-60%• Avoid mismatches at the 3’ end• The presence of G or C bases within the last five bases from the 3' end of primers
(GC clamp) helps promote specific binding at the 3' end. Avoid 3 or more G or C at the 3’ end because high primer-dimer probability
• Avoid a 3’ end T• Always have a reference gene (GAPDH, actin, RPLPO (Large Ribosomal Protein))
performed with your query genes• Optimal amplicon size: 100-1000 bps
http://www.sciencedirect.com/science/article/pii/S0888754311001066# Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University7
Primer design: Melting temperature (Tm) Tm is the temperature at which 50% of the DNA duplex dissociates to
become single stranded Determined by primer length, base composition and concentration Affected by the salt concentration of the PCR reaction mix
Optimal melting temperature: 52°C - 60°C Tm above 65°C may cause secondary annealing, higher Tm (75°C -
80°C) is recommended for amplifying high GC content targets Primer pair Tm mismatch
Significant primer pair Tm mismatch can lead to poor amplification (desirable Tm difference < 5°C between primer pairs)
Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University8
Primer design: Annealing temperatureTa (Annealing temperature) vs. Tm
Ta is determined by the Tm of both primers and amplicons:
optimal Ta=0.3 x Tm(primer)+0.7 x Tm(product)-25 General rule: Ta is 5°C lower than Tm
Higher Ta enhances specific amplification but may lower yields Crucial in detecting polymorphisms
Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University9
Primer design: Specificity and cross homology
Specificity: Determined primarily by primer length and sequence Cross homology: Cross homology may become a problem when
PCR template is DNA with highly repetitive sequences Avoid non-specific amplification: BLAST PCR primers against NCBI
non-redundant sequence database
Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University10
Primer design: Avoid secondary structures Hairpins are formed via intra-molecular interactions, negatively affect primer-template binding, leading to poor or no amplification Self-Dimer (homodimer)
Formed by inter-molecular interactions between the two same primers Cross-Dimer (heterodimer)
Formed by inter-molecular interactions between the sense and antisense primers
Avoid Template Secondary Structure
Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University11
Web Site: http://bioinfo.ut.ee/primer3-0.4.0/primer3/input.htm Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University12
Web Site: http://primer3plus.com/cgi-bin/dev/primer3plus.cgi Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University13
Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University14
Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University15
Web Site: http://genepipe.ngc.sinica.edu.tw/primerz/beginDesign.do
SNP primers:
0
Design specific primers for each transcript:
http://www4a.biotec.or.th/rexprimer2/Genotyping Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University16
SNPs
Copy number variation and InDels
http://www4a.biotec.or.th/rexprimer2/OligoChecking Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University17
Primer Design Tools for Degenerate PCR– CODEHOP
Web Site:
http://blocks.fhcrc.org/codehop.html
More Info: http://www.hsls.pitt.edu/guides/genetics/obrc/dna/pcr_oligos/URL1118954832/info
Name CODEHOP (COnsensus-DEgenerate Hybrid Oligonucleotide Primer) PCR primer design
Type Web-based software Key Functions Design degenerate PCR primers based on multiple protein sequences
alignments Publication Info Nucleic Acids Research 2003 Times Cited 37 Pros Widely cited with many successful applications; settings for genetic code and
codon usage; Cons Requires local multiple alignment as input and must be in Blocks Database
format; Note In OBRC YiBu’s Rating 4 out of 5
Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University18
Cross hybridization and specificity of primers
Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University19
http://www.ncbi.nlm.nih.gov/tools/primer-blast/
Resources for PCR Primer Specificity Analysis: NCBI BLAST
20
21
Primer specificity and Mapping: The UCSC In-Silico PCR
Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv Universityhttp://genome.csdb.cn/cgi-bin/hgPcr
Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University22
PCR reaction setup calculators
http://primerdigital.com/tools/ReactionMixture.html
Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University23
http://www.ncbi.nlm.nih.gov/probe
ESR1 human
Public PCR Primers/Oligo Probes Repository: The NCBI Probe Database
Resources for real time PCR: RTPrimerDB
Web Site:
http://www.rtprimerdb.org/ Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University24
Shows pre-calculated primers on all gene transcripts !
Web Site:
http://pga.mgh.harvard.edu/primerbank/index.html
More Info: http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&Cmd=ShowDetailView&TermToSearch=14654707
Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University25
Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University26
http://primerdepot.nci.nih.gov/
Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University27
http://eu.idtdna.com/pages/scitools
Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University28
Dilution CalculatorTakes an oligo stock solution of higher concentration and determines how much volume to dilute down to final (desired) lower concentration. Input of the volumes of the stock solution (Start Volume) and the diluted solution (End Volume) are not required, but recommended.
http://eu.idtdna.com/calc/dilution/
Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University29
http://www.frontiersin.org/Journal/10.3389/fendo.2011.00008/full http://gtbinf.wordpress.com/2012/11/29/exome-sequence-analysis-group-1/
Exome Analysis Identify genetic disease causes: Sequence the human coding regions of patient and healthy (1-2% of the human genome (~30Mb)), find the genomic cause of diseases.
Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University30
http://www.ebi.ac.uk/Tools/st/emboss_backtranseq/
>A8KAF4_HUMAN A8KAF4 Estrogen receptor OS=Homo sapiens PE=2 SV=1 ATGACCATGACCCTGCACACCAAGGCCAGCGGCATGGCCCTGCTGCACCAGATCCAGGGC AACGAGCTGGAGCCCCTGAACAGGCCCCAGCTGAAGATCCCCCTGGAGAGGCCCCTGGGC GAGGTGTACCTGGACAGCAGCAAGCCCGCCGTGTACAACTACCCCGAGGGCGCCGCCTAC GAGTTCAACGCCGCCGCCGCCGCCAACGCCCAGGTGTACGGCCAGACCGGCCTGCCCTAC GGCCCCGGCAGCGAGGCCGCCGCCTTCGGCAGCAACGGCCTGGGCGGCTTCCCCCCCCTG AACAGCGTGAGCCCCAGCCCCCTGATGCTGCTGCACCCCCCCCCCCAGCTGAGCCCCTTC CTGCAGCCCCACGGCCAGCAGGTGCCCTACTACCTGGAGAACGAGCCCAGCGGCTACACC GTGAGGGAGGCCGGCCCCCCCGCCTTCTACAGGCCCAACAGCGACAACAGGAGGCAGGGC GGCAGGGAGAGGCTGGCCAGCACCAACGACAAGGGCAGCATGGCCATGGAGAGCGCCAAG GAGACCAGGTACTGCGCCGTGTGCAACGACTACGCCAGCGGCTACCACTACGGCGTGTGG AGCTGCGAGGGCTGCAAGGCCTTCTTCAAGAGGAGCATCCAGGGCCACAACGACTACATG TGCCCCGCCACCAACCAGTGCACCATCGACAAGAACAGGAGGAAGAGCTGCCAGGCCTGC AGGCTGAGGAAGTGCTACGAGGTGGGCATGATGAAGGGCATCAGGAAGGACAGGAGGGGC GGCAGGATGCTGAAGCACAAGAGGCAGAGGGACGACGGCGAGGGCAGGGGCGAGGTGGGC AGCGCCGGCGACATGAGGGCCGCCAACCTGTGGCCCAGCCCCCTGATGATCAAGAGGAGC AAGAAGAACAGCCTGGCCCTGAGCCTGACCGCCGACCAGATGGTGAGCGCCCTGCTGGAC GCCGAGCCCCCCATCCTGTACCCCGAGTACGACCCCACCAGGCCCTTCAGCGAGGCCAGC ATGATGGGCCTGCTGACCAACCTGGCCGACAGGGAGCTGGTGCACATGATCAACTGGGCC AAGAGGGTGCCCGGCTTCGTGGACCTGACCCTGCACGACCAGGTGCACCTGCTGGAGTGC GCCTGGCTGGAGATCCTGATGATCGGCCTGGTGTGGAGGAGCATGGAGCACCCCGGCAAG CTGCTGTTCGCCCCCAACCTGCTGCTGGACAGGAACCAGGGCAAGTGCGTGGAGGGCATG GTGGAGATCTTCGACATGCTGCTGGCCACCAGCAGCAGGTTCAGGATGATGAACCTGCAG GGCGAGGAGTTCGTGTGCCTGAAGAGCATCATCCTGCTGAACAGCGGCGTGTACACCTTC CTGAGCAGCACCCTGAAGAGCCTGGAGGAGAAGGACCACATCCACAGGGTGCTGGACAAG ATCACCGACACCCTGATCCACCTGATGGCCAAGGCCGGCCTGACCCTGCAGCAGCAGCAC CAGAGGCTGGCCCAGCTGCTGCTGATCCTGAGCCACATCAGGCACATGAGCAACAAGGGC ATGGAGCACCTGTACAGCATGAAGTGCAAGAACGTGGTGCCCCTGTACGACCTGCTGCTG GAGATGCTGGACGCCCACAGGCTGCACGCCCCCACCAGCAGGGGCGGCGCCAGCGTGGAG GAGACCGACCAGAGCCACCTGGCCACCGCCGGCAGCACCAGCAGCCACAGCCTGCAGAAG TACTACATCACCGGCGAGGCCGAGGGCTTCCCCGCCACCGTG
http://www.ebi.ac.uk/Tools/st/emboss_transeq/
>=
>= 6 frames translation
http://www.bioinformatics.org/sms2/index.html Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University31
Format Conversion tools:Reverse and\or Complement of DNA sequences (http://www.bioinformatics.org/sms2/rev_comp.html)Split FASTA: divides FASTA sequence records into smaller FASTA sequences of the size you specify (http://www.bioinformatics.org/sms2/split_fasta.html) Sequence Analysis:DNA Pattern Find: accepts one or more sequences along with a search pattern and returns the number and positions of sites that match the pattern (http://www.bioinformatics.org/sms2/dna_pattern.html)PCR Primer Stats: accepts a list of PCR primer sequences and returns a report describing the properties of each primer, including melting temperature, percent GC content, and PCR suitability (http://www.bioinformatics.org/sms2/pcr_primer_stats.html)PCR Products: accepts one or more DNA sequence templates and two primer sequences. The program searches for perfectly matching primer annealing sites that can generate a PCR product. Any resulting products are sorted by size, and they are given a title specifying their length, their position in the original sequence, and the primers that produced them (http://www.bioinformatics.org/sms2/pcr_products.html) Reverse Translate (http://www.bioinformatics.org/sms2/rev_trans.html) Translate (http://www.bioinformatics.org/sms2/translate.html)Primer Map: accepts a DNA sequence and returns a textual map showing the annealing positions of PCR primers (http://www.bioinformatics.org/sms2/primer_map.html)
Resources for PCR Primer Mapping/Amplicon Size
Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University32
http://www.cmbi.ru.nl/cdd/biovenn/
x total127x only62x-y total overlap
y total628y only566x-z total overlap
z total0z only0y-z total overlap
Comparing gene-lists
Venny
http://bioinfogp.cnb.csic.es/tools/venny/
Microarray Experiments
Probes for genes are located on the chip. Hybridization of mRNA to the probes on the chip is performed and results are recorded.
Various platforms !
Next generation sequencing bypass the rate-limiting step of conventional DNA sequencing (separating randomly terminated DNA polymers by gel electrophoresis) by physically arraying DNA molecules on solid surfaces and determining the DNA sequence in situ, without the need for gel separation.
Anchor DNA single molecule to solid surface
Amplify template by in situ PCR
Add 4 color labeled reverse terminators, polymerase, universal primer
Reverse termination, repeat 1…100 times, the number of cycles determines the length of sequence.
Remove un-incorporated nucleotide
Detect with laser
http://molonc.bccrc.ca/?page_id=191
Next Generation Sequencing
In both technologies, the great advantage is achieved by novel bio-technologies for producing high throughput data !!!However, both have pros and cons…
Microarray and Next Generation Sequencing Technologies
33Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University
proscons
Arrays
relatively cheap
mature biotechnology and analysis tools (since the late 90’s)
fixed probes, no heterogeneity of coverage
highly reproducible
detection of only known transcripts
limited to sequenced organisms, no de-novo
higher background
low expressed genes are less accurately detected
NGS
very sensitive if sufficient sequence depth
direct read-out of all transcripts
paired-end reads, better accuracy
de-novo sequencing, new genomes
highly reproducible
new and exciting
still expensive
technical bias in mRNA library preparation and in transcripts of different length
pre-mature bioinformatics tools
de-novo analysis is tricky, ambiguity in mapping reads to the genome
very high coverage is needed for low expressed genes
variable sequence coverage for different genomic regions
In both, consistent biological interpretation !34
Marioni J C et al. Genome Res. 2008;18:1509-1517http://cage.unl.edu/RNASEQ_Transcriptomics.pdf Copyright © 2008,
Cold Spring Harbor Laboratory Press
Consistent Biological Interpretation ?
35Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University
NGS are becoming the technology of choice for a wide range of applications, but the transition away from microarrays is still long.
Different applications have different requirements, so researchers need to carefully weigh their options when making the choice for using a platform.
36
http://www.genengnews.com/gen-articles/next-generation-sequencing-vs-microarrays/4689/ Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University
Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University37
TAU Bioinformatics unit: who are we and what do we do ?
http://www.tau.ac.il/lifesci/bioinformatics.html [email protected]: 03-6406992
38Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University