single nucleotide polymorphisms and large scale variation
DESCRIPTION
How can we identify and study all the genetic changes that occur in so many different diseases? How can we explain why some people respond to treatment and not others?TRANSCRIPT
Lecture-6
Single nucleotide polymorphisms and Large scale variation
Huseyin Tombuloglu, Phd
GBE423 Genomics & Proteomics
►How can we identify and study all the genetic changes that occur in so many different diseases?
►How can we explain why some people respond to treatment and not others?
‘SNP’ is the answer to these questions…
• So what exactly are SNPs? • How are they involved in so many different
aspects of health?
What is SNP ?
►A SNP is defined as a single base change in a DNA sequence that occurs in a significant proportion (more than 1 percent) of a large population.
Some Facts• In human beings, 99.9 percent bases are same.
• Remaining 0.1 percent makes a person unique. – Different attributes / characteristics / traits • how a person looks, • diseases he or she develops.
• These variations can be:
– Harmless (change in phenotype)– Harmful (diabetes, cancer, heart disease,
Huntington's disease, and hemophilia )– Latent (variations found in coding and regulatory
regions, are not harmful on their own, and the change in each gene only becomes apparent under certain conditions e.g. susceptibility to lung cancer)
SNP facts►SNPs are found in
coding and (mostly) noncoding regions.
►Occur with a very high frequency about 1 in 1000 bases to 1 in 100 to 300 bases.
►The abundance of SNPs and the ease with which they can be measured make these genetic variations significant.
►SNPs close to particular gene acts as a marker for that gene.
►SNPs in coding regions may alter the protein structure made by that coding region.
SNPs may / may not alter protein structure
Types of genetic variation• Substitutions ACTGACTGACTGACTGACTG ACTGACTGGCTGACTGACTG
– Single Nucleotide Polymorphisms (SNPs)– Single Nucleotide Variations (SNVs)
• Insertions/deletions (INDELS) ACTGACTGACTGACTGACTG
ACTGACTGACTGACTGACTGACTG– Copy Number Variants (CNVs)
• Indels > 1Kb in size
Variant Calling 10
SNPs vs. SNVs• Really a matter of frequency of occurrence• Both are concerned with aberrations at a single nucleotide• SNP
– Aberration expected at the position for any member in the species (well-characterized)
– Occur in population at some frequency so expected at a given locus– Validated in population– Catalogued in dbSNP (http://www.ncbi.nlm.nih.gov/snp)
• SNV– Aberration seen in only one individual (not well characterized)– Occur at low frequency so not common– Not validated in population
9/12/2012
• Variation can have an effect on function– Non-synonymous substitutions can change the
amino acid encoded by a codon or give rise to premature stop codons
– Indels can cause frame-shifts– Mutations may affect splice sites or regulatory
sequence outside of genes or within introns
Human genetic variation
Identifying a causative de novo mutation
Patient with idiopathic disorder
Veltman and colleagues - Nat Genet. 2010 Dec;42(12):1109-12
(1) Sequence genome
(2) Select only coding mutations
(3) Exclude known variants seen in healthy people
(4) Sequence parents and exclude their
private variants
For 6/9 patients, they were able to identify a single likely-causative
mutation
(5) Look at affected gene function and mutational impact
~22,000 variants (exome re-sequencing)
MSGTCASTTRMSGTNASTTR
~5,640 coding variants
~143 novel coding variants
~5 de novo novel coding variants
Variant Calling 13
Catalogs of human genetic variation
• The 1000 Genomes Project– http://www.1000genomes.org/– SNPs and structural variants– genomes of about 2500 unidentified people from about 25 populations
around the world will be sequenced using NGS technologies• HapMap
– http://hapmap.ncbi.nlm.nih.gov/– identify and catalog genetic similarities and differences
• dbSNP– http://www.ncbi.nlm.nih.gov/snp/– Database of SNPs and multiple small-scale variations that include indels,
microsatellites, and non-polymorphic variants• COSMIC
– http://www.sanger.ac.uk/genetics/CGP/cosmic/– Catalog of Somatic Mutations in Cancer
9/12/2012
SNP or Mutation? Call it a SNP IF
the single base change occurs in a population at a frequency of 1% or higher.
Call it a mutation IFthe single base change occurs in less than 1% of a population.
A SNP is a polymorphic position where the point mutation has been fixed in the population.
From a Mutation to a SNP
SNPs ClassificationSNPs can occur anywhere on a genome, they are classified based on their locations.
Intergenic region Gene region
can be further classified as promoter region, and coding region (intronic, exonic, promoter region, UTR, etc.)
Coding Region SNPs Synonymous: do not result in a change of amino acid in
the protein, but still can affect its function in other ways Non-Synonymous
Missense – amino acid changeNonsense – changes amino acid to stop codon.
Geo
spiz
a G
reen
Arr
ow™
tuto
rial b
y Sa
ndra
Por
ter,
Ph.D
.
SynonymousAn example would be a seemingly silent mutation in the multidrug resistance gene 1 (MDR1), which codes for a cellular membrane pump that expels drugs from the cell, can slow down translation and allow the peptide chain to fold into an unusual conformation, causing the mutant pump to be less functionalMissense (e.g. c.1580G>T SNP in LMNA gene - position 1580 (nt) in the DNA sequence (CGT codon) causing the guanine to be replaced with the thymine, yielding CTT codon in the DNA sequence, results at the protein level in the replacement of the arginine by theleucine in the position 527, at the phenotype level this manifests in overlapping mandibuloacral dysplasia and progeria syndromeNonsensee.g. Cystic fibrosis caused by the G542X mutation in the cystic fibrosis transmembrane conductance regulator gene
The Consequences of SNPsThe phenotypic consequence of a SNP is significantly affected by the location where it occurs, as well as the nature of the mutation.
No consequence Affect gene transcription quantitatively or
qualitatively. Affect gene translation quantitatively or
qualitatively. Change protein structure and functions. Change gene regulation at different steps.
Simple/Complex Genetic Diseases and SNPs Simple genetic diseases (Mendelian diseases) are
often caused by mutations in a single gene. -- e.g. Huntington’s, Cystic fibrosis, PKU, etc.
Many complex diseases are the result of mutations in multiple genes, the interactions among them as well as between the environmental factors.-- e.g. cancers, heart diseases, Alzheimer's, diabetes, asthmas, etc.
Majority of SNPS may not directly cause any diseases. SNPs are ideal genomic markers (dense and easy to
assay) for locating disease loci in association studies.
A single base mutation in the APOE (apolipoprotein E) gene is associated with a higher risk for Alzheimer's disease
A single SNP may cause a Mendelian disease, though for complex diseases, SNPs do not usually function individually, rather, they work in coordination with other SNPs to manifest a disease condition as has been seen in Osteoporosis
rs6311 and rs6313 are SNPs in the Serotonin 5-HT2A receptor gene on human chromosome 13.
rs3091244 is an example of a triallelic SNP in the CRP gene on human chromosome 1.
TAS2R38 codes for PTC tasting ability, and contains 6 annotated SNPs.
NCBI dbSNPhttp://www.ncbi.nlm.nih.gov/SNP/index.html
NCBI Online Mendelian Inheritance in Man (OMIM)http://www.ncbi.nlm.nih.gov/sites/entrez?db=OMIM
International HapMap Projecthttp://www.hapmap.org/
Perlegen http://genome.perlegen.com
Genome Variation Server (Seattle SNPs)http://gvs.gs.washington.edu/GVS/
Main Genetic Variation Resources