gm01 gm07 0.36 gm01 gm08 0.40 gm01 gm09 0.48
DESCRIPTION
MadMapper_XDELTA. JoinMap. Record. MadMapper And CheckMatrix: Python Scripts To Infer Orders Of Genetic Markers And For Visualization And Validation Of Genetic Maps And Haplotypes. Alexander Kozik and Richard Michelmore. The Genome Center, University of California Davis, CA 95616. - PowerPoint PPT PresentationTRANSCRIPT
...................GM01 GM07 0.36 GM01 GM08 0.40 GM01 GM09 0.48 GM01 GM10 0.52 GM01 GM11 0.60 GM01 GM12 0.68 GM02 GM01 0.04 GM02 GM02 0.00 GM02 GM03 0.08 GM02 GM04 0.16 GM02 GM05 0.20 GM02 GM06 0.24 ...................
MadMapper And CheckMatrix: Python Scripts To Infer Orders Of Genetic Markers And
For Visualization And Validation Of Genetic Maps And Haplotypes.Alexander Kozik and Richard Michelmore.
The Genome Center, University of California Davis, CA 95616.
Contemporary molecular marker techniques can generate mapping data for thousands molecular markers simultaneously. Construction and validation of high density genetic maps is a challenge and requires robust, high-throughput approaches. As part of the Compositae Genome Project, we developed a suite of Python scripts for quality control of genetic markers, grouping and inference of linear order of markers in linkage groups. These scripts can be used in conjunction with other mapping programs or can be used as a stand-alone package. The suite consists of three programs: MadMapper_RECBIT, MadMapper_XDELTA and CheckMatrix. MadMapper_RECBIT analyses raw marker scores for recombinant inbred lines. MadMapper_RECBIT generates pairwise distance scores for all markers, clusters based on pairwise distances, identifies genetic bins, assigns new markers to known linkage groups, validates allele calls, and assigns quality classes to each marker based on several criteria and cutoff values. MadMapper_XDELTA utilizes a new algorithm, Minimum Entropy Approach and Best-Fit Extension, to infer linear order of markers. MadMapper_XDELTA analyzes two-dimensional matrices of all pairwise scores and finds best map that has minimal total sum of differences between adjacent cells (map with lowest entropy). This approach scales well and can accommodate large numbers of markers, unlike some commonly used mapping programs. CheckMatrix serves as a visualization tool to validate constructed genetic maps. CheckMatrix generates graphical genotypes and two-dimensional heat plots of pairwise scores. Visualization of regions with positive and negative linkage as well as of allele fraction per marker simplifies genetic map validation without applying statistical approaches. Scripts are freely available at http://cgpdb.ucdavis.edu/XLinkage/MadMapper/
BRIEF DESCRIPTION OF RIL MAPPING PIPELINE:1. Processing of raw markers scores and grouping: MadMapper_RECBIT generates multiple text files
for further analysis2. Construction of genetic map (ordering of markers) per linkage group: MadMapper_XDELTA (or any
other mapping program)3. Visualization and validation of genetic maps: CheckMatrix generates heat plots of
recombination scores and graphical genotyping
MadMapper and CheckMatrix are Python scripts and can be used on any computer platform: UNIX, Windows, Mac OS-X. Grouping can be done on a set of ~2,000 markers; map construction works in reasonable timeframe with up to ~500 markers
MadMapper_XDELTA JoinMap Record
physical coordinates of markers on Arabidopsis genome
infe
rred
ord
er o
f mar
kers
by
thre
e di
ffere
nt
appr
oach
es (m
appi
ng p
rogr
ams)
Side-by-side comparison of linear order of markers on Arabidopsis genome inferred by three different approaches(mapping programs) and comparison with physical order of markers (Col- 0 genomic sequence):
MadMapper_XDELTA (minimum entropy approach), JoinMap (maximum likelihood) and RECORD (minimum number of recombination events) [Diagonal dot-plot was created using GenoPix_2D_Plotter]
regions with negative linkage
regions with quasi linkage
main diagonalwith linked markers
2-D diagonal ChekMatrix heat-plot: all markers versus all markers [color gradient reflects linkage scores between markers]
Link
age
grou
p I
Link
age
grou
p II
Link
age
grou
p III
Link
age
grou
p IV
Link
age
grou
p V
Linkage group I Linkage group II Linkage group III Linkage group IV Linkage group V CheckMatrixgraphical genotyping
Haplotypes per RIL (inbred line)[ red – Columbia; blue – L.erecta ]
LINEAR ORDER OF MARKERS INFERRED BY THREE DIFFERENT METHODS:REFERENCES AND DATA SOURCES:1. Dean and Lister Arabidopsis Genetic Map and Raw Data: http://www.arabidopsis.info/new_ri_map.html2. MadMapper: http://cgpdb.ucdavis.edu/XLinkage/MadMapper/3. JoinMap: http://www.kyazma.nl/index.php/mc.JoinMap4. RECORD: http://www.dpw.wau.nl/pv/pub/recORD/index.htm5. GenoPix_2D_Plotter http://www.atgc.org/GenoPix_2D_Plotter/
CREDITS: This work was funded by NSF grant # 0421630 to Compositae Genome Consortium http://compgenomics.ucdavis.edu/
PAG-14 POSTERS WITH EXAMPLES OF MADMAPPER USAGE:
#P751 High-Density Haplotyping With Microarray-Based Single Feature Polymorphism Markers In Arabidopsis
#P761 Gene Expression Markers: Using Transcript Levels Obtained From Microarrays To Genotype A Segregating Population
allele compositionper markers
MINIMUM ENTROPY APPROACH TO INFER LINEAR ORDER OF MARKERS:
CheckMatrix 2D plot:
randomorderhigh
‘entropy’
partiallywrongorder
rightorderlow
‘entropy’
Example of group analysis by MadMapper_RECBIT
grouping cutoff stringency
dist
inct
link
age
grou
p #4
MadMapper_XDELTA analyzes two-dimensional matrices of all pairwise scores and finds best map that has minimal total sum of differences between adjacent cells (map with lowest ‘entropy’).
Two-dimensional matrix of recombination pairwise scores
CheckMatrix Color Scheme
adjacent cells(values)
Numerical datagenerated
by MadMapper
Visualization of numerical data
using ChekMatrix
Link
age
grou
p I
Link
age
grou
p II
Link
age
grou
p III
Link
age
grou
p IV
Link
age
grou
p V
VISUALIZATION OF ARABIDOPSIS GENETIC MAP (DEAN AND LISTER, http://www.arabidopsis.info/ ) USING CHECKMATRIX[ MAP WAS RE-CONSTRUCTED USING MADMAPPER ]
high densityof markers
low densityof markers
MadMapper JoinMap RECORD
CHECKMATRIX USAGE:Three input files are required:
LG GM01 0 LG GM02 1 LG GM03 2 LG GM04 3 LG GM05 4 LG GM06 5 LG GM07 6 LG GM08 7 LG GM09 8 LG GM10 9 LG GM11 10 LG GM12 11
Map fileMatrix file
; 1 10 20 25; | | | | GM01 A A A A A A A A A A A A A A A A B B B B B B B B B GM02 A A A A A A A A A A A A A A A B B B B B B B B B B GM03 A A A A A A A A A A A A A B B B B B B B B B B B B GM04 A A A A A A A A A A A B B B B B B B B B B B B B B GM05 A A A A A A A A A A B B B B B B B B B B B B B B B GM06 A A A A A A A A A B B B B B B B B B B B B B B B B GM07 A A A A A A A A A B B B B B B B B B B B B B B A A GM08 A A A A A A A A A B B B B B B B B B B B B B A A A GM09 A A A A A A A A A B B B B B B B B B B B A A A A A GM10 B A A A A A A A A A B B B B B B B B B A A A A A A GM11 B B A A A A A A A A B B B B B B B B A A A A A A A GM12 B B B A A A A A A A B B B B B B B A A A A A A A A
Locus file
CheckMatrix
Upon program execution three output files will be generated:
HEAT PLOT – it assists to validate the quality of constructed genetic map and identify markers with
wrong position
GRAPHICAL GENOTYPING: visualization of haplotypes per
recombinant line (suspicious double
crossovers are highlighted)
1
2
CIRCULAR GRAPH – it assists to validate
genetic map and identify markers with
spurious linkage
3