protocol for epatitis irus genotyping ...1 protocol for hepatitis c virus genotyping/subtyping tool...

6
1 PROTOCOL FOR HEPATITIS C VIRUS GENOTYPING/SUBTYPING TOOL May 4, 2018 1. Background Hepatitis C Viruses (HCV) have diversified into seven major genotypes (1-7) over time. Each major genotype is further classified into genotype/subtypes, e.g., 1a, 1b, 1c, etc. A list of current genotype and subtype assignments is maintained by the Flaviviridae Study Group of the International Committee on Taxonomy of Viruses (ICTV) (https://talk.ictvonline.org/ictv_wikis/flaviviridae/w/sg_flavi/56/hcv-classification). As of June 2017, the number of confirmed genotypes/subtypes has increased to 86 (ICTV, 2017). In order to assist researchers in designating appropriate assignments for new HCV sequences using current genotype/subtype assignments, the ViPR team has developed an HCV Genotyping/ Subtyping Tool. This document describes the HCV genotyping/subtyping tool in ViPR. 2. Method Description An automated pipeline was developed for assigning genotype/subtype to un-genotyped HCV sequences, whereby: 2.1 A reference alignment is constructed following the steps below: 2.1.1 The reference alignment published by the ICTV on June 8, 2017 (Updated alignment (FASTA) of HCV genotypes and subtypes 1.6.17.FST; https://talk.ictvonline.org/ictv_wikis/flaviviridae/w/sg_flavi/57/hcv-reference-sequence- alignments) was trimmed to the CDS region only. 2.1.2 Additional sequences with confirmed subtype (provided by Dr. Donald Smith) were added to the above alignment using MAFFT (mafft --addfragments). 2.1.3 Manually adjusted one insertion introduced by the new sequences to keep the reference alignment intact. 2.1.4 The resulting alignment contains 231 HCV reference sequences. The reference alignment can be downloaded from the ViPR site: https://www.viprbrc.org/brc/workbenchSequenceSearch.spg?uploadedFileId=20272&decorato r=flavi&method=SubmitForm 2.2 A reference tree is computed following the steps below: 2.2.1 The multiple sequence alignment described above was input to RAxML (version 7.2.6) with the GTR model of nucleotide substitution and a discrete gamma model with 4 categories. 2.2.2 The output best tree (RAxML_bestTree) is then midpoint rooted using Archaeopteryx. 2.2.3 The resulting midpoint-rooted tree is used as the reference tree in the HCV typing tool. It can be viewed or downloaded from the ViPR site: https://www.viprbrc.org/brc/uploadedFileDetail.spg?method=SharedFileDetail&uploadedFileI d=20275&decorator=flavi 2.3 A query sequence is checked with regard to its sequence type and sequence length. Minimum length requirement is 400 bp. 2.4 A query sequence is aligned against the reference alignment using MAFFT (mafft -- keeplength --add). 2.5 The query sequence is placed into the reference tree using pplacer (Matsen, 2010), with the reference tree serves as a “scaffold” onto which the query sequence is placed. 2.6 The pplacer output is parsed by guppy. 2.7 The guppy output is analyzed by cladinator (https://sites.google.com/site/cmzmasek/home/software/forester/cladinator).

Upload: others

Post on 07-Apr-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PROTOCOL FOR EPATITIS IRUS GENOTYPING ...1 PROTOCOL FOR HEPATITIS C VIRUS GENOTYPING/SUBTYPING TOOL May 4, 2018 1. Background Hepatitis C Viruses (HCV) have diversified into seven

1

PROTOCOL FOR HEPATITIS C VIRUS GENOTYPING/SUBTYPING TOOL May 4, 2018

1. Background Hepatitis C Viruses (HCV) have diversified into seven major genotypes (1-7) over time. Each major genotype is further classified into genotype/subtypes, e.g., 1a, 1b, 1c, etc. A list of current genotype and subtype assignments is maintained by the Flaviviridae Study Group of the International Committee on Taxonomy of Viruses (ICTV) (https://talk.ictvonline.org/ictv_wikis/flaviviridae/w/sg_flavi/56/hcv-classification). As of June 2017, the number of confirmed genotypes/subtypes has increased to 86 (ICTV, 2017). In order to assist researchers in designating appropriate assignments for new HCV sequences using current genotype/subtype assignments, the ViPR team has developed an HCV Genotyping/ Subtyping Tool. This document describes the HCV genotyping/subtyping tool in ViPR. 2. Method Description An automated pipeline was developed for assigning genotype/subtype to un-genotyped HCV sequences, whereby: 2.1 A reference alignment is constructed following the steps below: 2.1.1 The reference alignment published by the ICTV on June 8, 2017 (Updated alignment (FASTA) of HCV genotypes and subtypes 1.6.17.FST; https://talk.ictvonline.org/ictv_wikis/flaviviridae/w/sg_flavi/57/hcv-reference-sequence-alignments) was trimmed to the CDS region only. 2.1.2 Additional sequences with confirmed subtype (provided by Dr. Donald Smith) were added to the above alignment using MAFFT (mafft --addfragments). 2.1.3 Manually adjusted one insertion introduced by the new sequences to keep the reference alignment intact. 2.1.4 The resulting alignment contains 231 HCV reference sequences. The reference alignment can be downloaded from the ViPR site: https://www.viprbrc.org/brc/workbenchSequenceSearch.spg?uploadedFileId=20272&decorator=flavi&method=SubmitForm 2.2 A reference tree is computed following the steps below: 2.2.1 The multiple sequence alignment described above was input to RAxML (version 7.2.6) with the GTR model of nucleotide substitution and a discrete gamma model with 4 categories. 2.2.2 The output best tree (RAxML_bestTree) is then midpoint rooted using Archaeopteryx. 2.2.3 The resulting midpoint-rooted tree is used as the reference tree in the HCV typing tool. It can be viewed or downloaded from the ViPR site: https://www.viprbrc.org/brc/uploadedFileDetail.spg?method=SharedFileDetail&uploadedFileId=20275&decorator=flavi 2.3 A query sequence is checked with regard to its sequence type and sequence length. Minimum length requirement is 400 bp. 2.4 A query sequence is aligned against the reference alignment using MAFFT (mafft --keeplength --add). 2.5 The query sequence is placed into the reference tree using pplacer (Matsen, 2010), with the reference tree serves as a “scaffold” onto which the query sequence is placed. 2.6 The pplacer output is parsed by guppy. 2.7 The guppy output is analyzed by cladinator (https://sites.google.com/site/cmzmasek/home/software/forester/cladinator).

Page 2: PROTOCOL FOR EPATITIS IRUS GENOTYPING ...1 PROTOCOL FOR HEPATITIS C VIRUS GENOTYPING/SUBTYPING TOOL May 4, 2018 1. Background Hepatitis C Viruses (HCV) have diversified into seven

2

2.7.1 Background of cladinator logics cladinator assigns a genotype/subtype for the query sequence based on its placement in the phylogeny:

● When a query sequence is placed unequivocally within the bounds of a single defined type, this type name is assigned to the query (Figure 1A).

● When a query sequence is bracketed by two different types and these two types share a common parent type in their type names, the parent type name is assigned to the query (Figure 1B).

● When a query sequence is bracketed by two different types with no common type name in the type names, the query sequence is of unknown type (Figure 1C).

A B C

Figure 1. cladinator analysis of query placements in hierarchically annotated artificial trees. (A) Query is A-type (bracketed by A and A). (B) Query is A-type (bracketed by A.1 and A.2). (C) Query is of unknown type (bracketed by A and B). In reference to Q, A is called “down-tree”, while B is called “up-tree.” Naïvely, it looks like Q might be of A-type, but we do not know at which point along the branch going from AB-ancestor to A, the type changes from AB-ancestor-type to A-type. Therefore, Q is of unknown type. 2.7.2 cladinator analysis of a single query placement For each query placement, cladinator reports the typing assignment in the following fields:

• Matching Clade • Matching Down-tree Bracketing Clade • Matching Up-tree Bracketing Clade

Example placements along with cladinator reports are provided in Figure 2. A

X A AQ

QisA-type(bracketedbyAandA)

X A.1 A.2Q

QisofA-type(bracketedbyA.1andA.2)

X A BQ

Qisofunknowntype(bracketedbyAandB)

[inreferencetoQ,Aiscalled“down-tree”,Biscalled“up-tree”]

Naïvely,itlookslikeQmightbeofA-type,butwedonotknowatwhichpointalongthebranchgoingfromAB-ancestortoA,thetypechangesfromAB-ancestor-typetoA-type

Query:QMatchingClade(s):A.1.1:1.0

MatchingDown-treeBracketingClade(s):A.1.1.1:1.0

MatchingUp-treeBracketingClade(s):A.1.1.2:1.0

Page 3: PROTOCOL FOR EPATITIS IRUS GENOTYPING ...1 PROTOCOL FOR HEPATITIS C VIRUS GENOTYPING/SUBTYPING TOOL May 4, 2018 1. Background Hepatitis C Viruses (HCV) have diversified into seven

3

B

C

Figure 2. cladinator analysis of single query placements in a hierarchically annotated artificial tree. Query placements are in red. cladinator output is to the right of the tree. 2.7.3 cladinator analysis of multiple query placements When a query has multiple placements, cladinator summarizes the results if possible. Specifically, when two or more placements are assigned the same type (e.g., A.1.1 in Figure 3), the Specific-hit field reports that the probability score for the shared type (e.g., A.1.1 in Figure 3) is the sum of individual placement’s probability score. A B

Figure 3. cladinator analysis of multiple query placements in a hierarchically annotated artificial tree. Query placements are in red. cladinator output is to the right of the tree.

Query:QMatchingClade(s):A.1:1.0

MatchingDown-treeBracketingClade(s):A.1:1.0

MatchingUp-treeBracketingClade(s):A.1.3.1:1.0

Query:QMatchingClade(s):?:1.0

MatchingDown-treeBracketingClade(s):A:1.0

MatchingUp-treeBracketingClade(s):B:1.0

Query:QMatchingClade(s):A.1:1.0

Specific-hit(s):A.1.1:0.95

MatchingClade(s)withSpecific-hit(s):A.1:1.0A.1.1:0.95

MatchingDown-treeBracketingClade(s):A.1.1:1.0

MatchingUp-treeBracketingClade(s):A.1:1.0

Query:QMatchingClade(s):B:0.9C:0.1

MatchingDown-treeBracketingClade(s):B:0.9C.1:0.1

MatchingUp-treeBracketingClade(s):B:0.9C.2:0.1

Page 4: PROTOCOL FOR EPATITIS IRUS GENOTYPING ...1 PROTOCOL FOR HEPATITIS C VIRUS GENOTYPING/SUBTYPING TOOL May 4, 2018 1. Background Hepatitis C Viruses (HCV) have diversified into seven

4

3. Access of the tool The HCV genotyping/subtyping tool is accessible from Virus Pathogen Resource > Hepatitis C Virus or Flaviviridae > Analyze & Visualize > Genotype-Recombination Detection (https://www.viprbrc.org/brc/genotypeRecombination.spg?method=ShowCleanInputPage&decorator=flavi_hcv). Input sequences can be provided by choosing a working set saved in the Workbench, uploading a file, pasting in FASTA-formatted sequences, or a sequence file uploaded to the Workbench (Figure 4). The analysis report provides the full report from cladinator and the alignment and tree used to type the input sequence (Figure 5).

Figure 4. The HCV genotyping tool landing page

Figure 5. An example of the HCV genotyping report. On this page, users can download: (a) the input alignment which is an alignment of the query sequence with the reference alignment, (b) the output tree with the query sequence placed in the tree, and (c) the subtype assignment file which includes the table being displayed and additional information from the typing tool. The phylogenetic tree – View hyperlinks link to the output tree displayed in the Archaeopteryx.js tree viewer as shown in Figure 6.

11/28/2017 Virus Pathogen Database and Analysis Resource (ViPR) - Flaviviridae - Genotype determination and Recombination detection

https://www.bacpathbrc.org/brc/genotypeRecombination.spg?method=ShowCleanInputPage&decorator=flavi_hcv 1/1

Loading Virus Pathogen Database and Analysis Resource (ViPR)...

Release Date: Nov 27, 2017

This system is provided for authorized users only. Anyone using this system expressly consents to monitoring while using the system. Improper use of this system may be referred to law

enforcement officials. 

This project is funded by the National Institute of Allergy and Infectious Diseases (NIH / DHHS) under Contract No. HHSN272201400028C and is a collaboration between Northrop

Grumman Health IT, J. Craig Venter Institute, and Vecna Technologies. Virus images courtesy of CDC Public Health Image Library, Wellcome Images, U.S. Department of Veterans

Affairs, Science of the Invisible and ViralZone, Swiss Institute of Bioinformatics.

Analyze sequences saved in working sets

Analyze my custom sequences only. Upload a file containing my sequences in FASTA  format.

Paste sequences in FASTA  format.

Analyze my custom sequences and associated metadata with ViPRsequences. 

SOURCE OF SEQUENCES TO BE ANALYZED  *Sequences can also be selected from search results or a working set in yourworkbench.

RunClear

ANALYSIS NAME

HCV Genotyping/Subtyping Tool (Beta)

The HCV Genotype Determination tool classifies the genotype/subtype of HCV viruses, based on the HCV genotype/subtype assignments maintained by the InternationalCommittee on Taxonomy of Viruses (ICTV) . This tool infers the genotype/subtype for a query sequence from its position within a reference tree. ● SOP for HCV genotyping tool ● HCV Reference Alignment● HCV Reference Tree  The tool requires input sequences to be at least 400 bp. Short sequences will yield unpredictable and likely incorrect results. 

ViPR Home   Hepatitis C virus Home   Genotype determination and Recombination detection

SEARCH DATA ANALYZE & VISUALIZE WORKBENCH SUBMIT DATA VIRUS FAMILIES HELP [email protected]

Hepatitis C virusAbout Us Community Announcements Links Resources Support

Page 5: PROTOCOL FOR EPATITIS IRUS GENOTYPING ...1 PROTOCOL FOR HEPATITIS C VIRUS GENOTYPING/SUBTYPING TOOL May 4, 2018 1. Background Hepatitis C Viruses (HCV) have diversified into seven

5

A

B

Figure 6. (A) An example HCV typing output tree visualized in Archaeopteryx.js tree viewer. The query sequence is highlighted in green by default. Users can adjust the look of the tree by using various visualization options in the left panel. (B) The same tree shown in (A) is zoomed in on the Y-axis. Hidden sequence labels in (A) are displayed.

Dynamically hide sequence labels when space is limited Expand the tree on the Y-axis. Hidden labels will be displayed when there is enough space Adjust sequence label font size Reset search Search in sequence labels and highlight matched nodes (searching for ‘#’ highlights query sequence)

Query sequence is highlighted in green by default

Page 6: PROTOCOL FOR EPATITIS IRUS GENOTYPING ...1 PROTOCOL FOR HEPATITIS C VIRUS GENOTYPING/SUBTYPING TOOL May 4, 2018 1. Background Hepatitis C Viruses (HCV) have diversified into seven

6

References International Committee on Taxonomy of Viruses (ICTV). HCV Classification. https://talk.ictvonline.org/ictv_wikis/flaviviridae/w/sg_flavi/56/hcv-classification Matsen FA, et al. pplacer: linear time maximumlikelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Biomathematics. 2010, 11:538. PMID: 21034504. Zmasek CM. cladinator. https://sites.google.com/site/cmzmasek/home/software/forester/cladinator Zmasek CM. Archaeopteryx.js. https://sites.google.com/site/cmzmasek/home/software/archaeopteryx-js