genome representation and variant identification
DESCRIPTION
Genome representation and variant identification. Deanna M. Church, NCBI. The Reference Assembly is NOT Static. NCBI35 (hg17). NCBI36 (hg18). GRCh37 (hg19). GRCh37.p9. Image credit: http :// www.tohlejokes.com. http://genomereference.org. Resolved: 716 Open: 697. - PowerPoint PPT PresentationTRANSCRIPT
Genome representation and variant identification
Deanna M. Church, NCBI
The Reference Assembly is NOT Static
NCBI35 (hg17)NCBI36 (hg18)GRCh37 (hg19)GRCh37.p9
Image credit: http://www.tohlejokes.com
http://genomereference.org
Resolved: 716Open: 697
http://www.ncbi.nlm.nih.gov/dbvar
Studies
Variant Regions
Variant Calls
Variant Region nsv531833 type: CNV
Variant Calls: nssv577112 type: copy number gain Method: Oligo aCGH Analysis: Probe signal intensity phenotype: Autism; etc. Clinical: Pathogenic Copy Number: 3
Variant Calls: nssv580124 type: copy number loss Method: Oligo aCGH Analysis: Probe signal intensity phenotype: Autism. Clinical: Pathogenic Copy Number: 1
MethodsAnalysis
PublicationsSamples
Submitted assembly
Variant Call Ambiguitystart stop
Inner start Inner stop
Outer start Outer stop
Probes with decreased signal intensityProbes with expected signal intensity
breakpoint breakpoint
Inner start Inner stop
Variant Call AmbiguityOuter start Outer stop
Fosmid clone (40 Kb +/- 1 Kb)
20Kb Clone has an insertionrelative to the genome
Clone has a deletionrelative to the genome 60 Kb
Assembly, Mis-assembly, Biology and Variant Interpretation
BAC insertBAC vector
Shotgun sequence
Assemble
GAPS
“finishers” go in to manually fill the gaps, often by PCR
NCBI36 (hg18)
GRCh
37 (h
g19)
NCBI35 (hg17)
GRCh37 (hg19)
AL139246.20
AL139246.21
Build sequence contigs based on contigs defined in TPF (Tiling Path File).
Check for orientation consistenciesSelect switch pointsInstantiate sequence for further analysis
Switch point
Consensus sequence
NCBI36
nsv832911 (nstd68) Submitted on NCBI35 (hg17)
NCBI35 (hg17) Tiling Path
GRCh37 (hg19) Tiling Path
Gap Inserted
Moved approximately 2 Mb distal on chr15
NC_0000015.8 (chr15)
NC_0000015.9 (chr15)
Removed from assembly
Added to assembly
HG-24
Sequences from haplotype 1Sequences from haplotype 2
Old Assembly model: compress into a consensus
New Assembly model: represent both haplotypes
AC074378.4AC079749.5
AC134921.2AC147055.2
AC140484.1AC019173.4
AC093720.2AC021146.7
NCBI36 NC_000004.10 (chr4) Tiling Path
Xue Y et al, 2008
TMPRSS11E TMPRSS11E2
GRCh37 NC_000004.11 (chr4) Tiling Path
AC074378.4AC079749.5
AC134921.1AC147055.2
AC093720.2AC021146.7
TMPRSS11E
GRCh37: NT_167250.1 (UGT2B17 alternate locus)
AC074378.4AC140484.1
AC019173.4AC226496.2
AC021146.7
TMPRSS11E2
nsv532126 (nstd37)
GRCh37
81 FIX Patches71 NOVEL Patches
GRCh37.p9
Dennis et al., 2012
1q32 1q21 1p21
1p21 patch alignment to chromosome 1
Finding the data
How dbVar* manages data
*and most other NCBI databases too
Object Method Analysis Clinical assertion
NCBI36 location
Etc…
nsv1000 Oligo aCGH Probe signal intensity
None Location Etc…
nsv2000 Sequencing Paired end analysis
None Location Etc…
nsv3000 Sequencing Read Depth
Benign Location Etc..
… … … … … …
Search Term
Variant submitted on NCBI35 (hg17)Failed to remap to NCBI36 (hg18)Successful remap to GRCh37 (hg19)
No results in ‘normal’ dbVar searchGenome Sensor predicts this is a location -> points to dbVar Genome Browser
Acknowledgements
dbVar
John LopezTim HefferonJohn GarnerChao ChenGeorge ZhouVictor Ananiev
NCBI
Collaborators
DGVaDGV
GRCNCBI
Valerie SchneiderNathan BoukHsiu-Chuan Chen
Collaborators
TGI-WUWTSIEBI
ISCANCBI Genomes, Viewers and Variation groups