gmi proficiency testing- progress report 2016
TRANSCRIPT
GMI proficiency testing-Progress report 2016
9th GMI meeting 23th - 25th May 2016
Rome, Italy
Presented by James Pettengill (US FDA)Rene Hendriksen (DTU-Food)
2
Layout of the full roll outThe PT consist of three components“Wet-lab”• 1a) DNA extraction, purification, library-preparation, and whole-
genome-sequencing of six bacterial cultures; – two Salmonella strains – two Escherichia coli strains (only one was included)– two Staphylococcus aureus strainsUpload reads to an ftp-site Optionally, identify MLST and resistance genes present in the strains
• 1b) Whole-genome-sequencing of pre-prepared DNA of the same six bacterial strains mentioned in component 1a for comparison of DNA and library prep from component 1a
“Dry-lab”• 2) Variant detection and phylogenetic/clustering analysis of three
datasets fastq datasets from app. 20 genomes of S. Typhimurium, E. coli and S. aureus
3
Updated draft Action Plan and Milestones for 2014/15 ”Full PT roll-out phase”
Q3 2014
Q4 2014
Q1 2015
Q2 2015
Preparation of reference material
Dispatch of reference materials
Adjust documentation, prenotification, invitation letter, instructions (SOP) and guidelines for PT
Final analysis (report) of the pilot PT
Invitation for the full PT roll out
Final analysis of the full roll PT
Q2 2015
Q3 2015
Q4 2015
Q1 2016
Invitation for the full PT roll out
Adjust documentation, prenotification, invitation letter, instructions (SOP) and guidelines for PT
Preparation of reference material
Dispatch of reference materials
”Better late than never”
Participation in the PT
49 participating labs in total
5 May 2, 2023
Survey to capture technical and background information - component 2, wet-lab
6 2. maj 2023
Measured QC parameters – component 2 - wet-lab
• Number of reads • Average read length • Number of reads mapped to
– reference DNA sequence– reference chromosome– reference plasmid #1 – reference plasmid #2 – reference plasmid #3
• Proportion of reads mapped to the above• Depth of coverage, of the above• Size of assembled genome • Size of assembled genome per total size of DNA sequence (%)• Total number of contigs• Number of contigs > 200 bp• N50• NG50
7 May 2, 2023
Size of assembled genome per total size of DNA sequence (%)
The proportion of contigs, which map directly to the closed genome (same strain). This cannot exceed 100%.
100%
8 May 2, 2023
N50
The N50 length is defined as the length for which the collection of all contigs of that length or longer contains at least half of the sum of the lengths of all contigs, and for which the collection of all contigs of that length or shorter also contains at least half of the sum of the lengths of all contigs. A N50 more than 15000 normally indicate good quality.
300.000
15.000
700.000
9 May 2, 2023
Total number of contigsThe total number of contigs assembled. A number of contigs less than 1000 normally indicate good quality.
1000
10 May 2, 2023
Proportion of reads mapped to reference DNA sequence (%)
The proportion of reads produced which map directly to the closed genome (same strain). This cannot exceed more than 100%.
100%
11 May 2, 2023
SNP analysis of the different QC strains
12 May 2, 2023
Participant feedback - component 2, wet-lab
13 2. maj 2023
Analysis of the - component 2, wet-lab • All outliers will be removed from the final analysis to suggest
tentative QC thresholds
• To suggest tentative QC thresholds, QC data will be related to MLST and AMR “ref.” data analysis, SNP analysis, as well as technical / background data
• Provided MLST and AMR data will be analyzed in relation to “ref.” data to evaluate the performance of bioinformatic tool utilized for the PT
14
Dry-lab
The objective:• Assess the differences that exists in the detection of variants
(e.g., single nucleotide polymorphisms (SNPs)) from the analysis of whole genome sequence data.
• Participants were provided 3 datasets to analyze with the current protocol implemented in their lab
Submitted by participants:1. Answer an online survey 2. Fasta formatted matrix of variants3. Newick formatted tree (phylogeny) file.
Formatting issues
15
Dry-labparticipation/result
files
16
Dry-lab
Survey results: a diversity of methods and decisions
17
Dry-labResults of the analysis of fasta matrices
Labs congruent
Labs incongruent
May 2, 2023
18
Dry-labResults of the analysis of newick tree files – trees are pretty different
May 2, 2023
19 May 2, 2023
Dry-labSummary and key findings:
• A total of 190 results files were submitted with a relatively even distribution across the three taxonomic groups and file type (fasta or newick tree)
• Not surprisingly, there are a diversity of algorithms being employed (e.g., map reads and infer a phylogeny)
• Participants also differed in the choices they made with respect to quality filtering and contamination checking
• The number of positions within the different fasta matrices differed greatly but they seem to carry similar information content in terms of the relative magnitude of differences between samples
• The trees differed greatly• What does this mean for traceback?• Should we capture different information to compare
results (e.g., positions within the reference)?
20 02-05-2023
Acknowledgement and Supported by
Susanne Karlsmose (DTU Food)Oksana Lukjancenko (DTU Food)Charlotte Ingvorsen (DTU Food)Pimlapas Leekitcharoenphon (DTU Food)Rolf Sommer Kaas (DTU Food)Inge Marianne Hansen (DTU Food)Jose Luis Bellod Cisneros (DTU Systems Biology) Anthony Underwood (PHE)Division of Microbiology (CFSAN/FDA)Brian Beck (Microbiologics)Isabel Cuesta de la plaza (ISCIII)Angel Zaballos (ISCIII)Jorge De La Barrera Martinez (ISCIII)…..and the rest of WG 4 (“advisory group”)
DTU Food, Technical University of Denmark
Thank you for your attention