cravat&and&mupit&interac2ve:&& web&tools&for& cancer ... · cravat...

33
CRAVAT and MuPIT Interac2ve: Web Tools for Cancer Muta2on Analysis Rachel Karchin, Ph.D. Department of Biomedical Engineering Ins2tute for Computa2onal Medicine Johns Hopkins University The Cancer Genome Atlas’ 2 nd Annual Scien2fic Symposium November 2728, 2012

Upload: others

Post on 16-Sep-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CRAVAT&and&MuPIT&Interac2ve:&& Web&Tools&for& Cancer ... · CRAVAT and muPIT: Web Services for High-Throughput Analysis of Sequence Variation in Cancer Author: Rachel Karchin Subject:

CRAVAT  and  MuPIT  Interac2ve:    Web  Tools  for    

Cancer  Muta2on  Analysis  

Rachel  Karchin,  Ph.D.  Department  of  Biomedical  Engineering  Ins2tute  for  Computa2onal  Medicine  

Johns  Hopkins  University  

 The  Cancer  Genome  Atlas’  2nd  Annual  Scien2fic  Symposium  

November  27-­‐28,  2012    

Page 2: CRAVAT&and&MuPIT&Interac2ve:&& Web&Tools&for& Cancer ... · CRAVAT and muPIT: Web Services for High-Throughput Analysis of Sequence Variation in Cancer Author: Rachel Karchin Subject:

Need  for  computa2onal  tools  to  analyze  large-­‐scale  cancer  muta2on  data  

2  

Page 3: CRAVAT&and&MuPIT&Interac2ve:&& Web&Tools&for& Cancer ... · CRAVAT and muPIT: Web Services for High-Throughput Analysis of Sequence Variation in Cancer Author: Rachel Karchin Subject:

Map  to  transcripts  Iden2fy  type  of  

change  (missense,  nonsense,  silent)  

Iden2fy  known  variants  and  muta2ons  

Predict  driver  vs.  random  muta2ons  

Predict  func2onal  impact  of  muta2ons  

Find  significantly  mutated  genes  and  pathways  

List  of  muta2ons  from  tumor  sequencing  

Visualize  muta2ons  on  

ter2ary  structure  

Analysis  

Goal  is  to  provide  an  end-­‐to-­‐end  muta2on  analysis  workflow  

3  

Page 4: CRAVAT&and&MuPIT&Interac2ve:&& Web&Tools&for& Cancer ... · CRAVAT and muPIT: Web Services for High-Throughput Analysis of Sequence Variation in Cancer Author: Rachel Karchin Subject:

The  majority  of  soma2c  muta2ons  in  tumor  exomes  are  missense  

Missense  

Nonsense  

Silent  

Splice  

Other  

Stephens  et  al  2005  Davies  et  al  2005  Hunter  et  al  2005  Sjoblom  et  al  2006  Sjoblom  et  al  2006  

Colorectal   Breast  

Greenman  et  al  2007   Jones  et  al  2008   Parsons  et  al  2008   McLendon  et  al  2008   Ding  et  al  2008  

Jones  et  al  2010  Parsons  et  al  2011   TCGA  2010   Stransky  et  al  2011   Li  et  al  2011  

4  

Page 5: CRAVAT&and&MuPIT&Interac2ve:&& Web&Tools&for& Cancer ... · CRAVAT and muPIT: Web Services for High-Throughput Analysis of Sequence Variation in Cancer Author: Rachel Karchin Subject:

Tools  for  evalua2ng  missense  muta2ons  –  CHASM:  cancer  driver  analysis  –  VEST:  Func2onal  effect  analysis  –  Annota2ons  (1000g,  ESP6500,  COSMIC,  

GeneCards,  PubMed)  

Interac2ve  visualiza2on  on  3D  protein  structure  –  Automa2c  mapping  onto  available  structures  –  Simple  interac2ve  interface  –  UniProtKB  feature  table  annota2ons  provided  –  Publica2on  quality  figures  

hfp://www.cravat.us  

hfp://mupit.icm.jhu.edu  

5  

Page 6: CRAVAT&and&MuPIT&Interac2ve:&& Web&Tools&for& Cancer ... · CRAVAT and muPIT: Web Services for High-Throughput Analysis of Sequence Variation in Cancer Author: Rachel Karchin Subject:

Outline  

•  Introduc2on  to  CRAVAT  •  Introduc2on  to  MuPIT  •  Future  plans  •  Your  input  

6  

Page 7: CRAVAT&and&MuPIT&Interac2ve:&& Web&Tools&for& Cancer ... · CRAVAT and muPIT: Web Services for High-Throughput Analysis of Sequence Variation in Cancer Author: Rachel Karchin Subject:

CRAVAT  Web  Server    hfp://www.cravat.us  

!"#$%&$%'"()*'+#%)$,-.(/01$%1#.$&0$

*2"(3.$45+)).().6$(&().().6$)+7.(%8$

,-.(/01$9(&:($;"'+"(%)$"(-$5<%"/&()$

='.-+*%$-'+;.'$;)>$'"(-&5$5<%"/&()$

='.-+*%$0<(*/&("7$+5#"*%$&0$5<%"/&()$

?+(-$)+3(+@*"(%71$5<%"%.-$3.(.)$"(-$#"%2:"1)$

A+)%$&0$5<%"/&()$0'&5$%<5&'$).B<.(*+(3$

C+)<"7+D.$5<%"/&()$&($

%.'/"'1$)%'<*%<'.$

E("71)+)$

7  

Page 8: CRAVAT&and&MuPIT&Interac2ve:&& Web&Tools&for& Cancer ... · CRAVAT and muPIT: Web Services for High-Throughput Analysis of Sequence Variation in Cancer Author: Rachel Karchin Subject:

Input  •  Paste  into  text  box  or  upload  text  file  

 •  Genomic  Coordinates  

–  Unique  id  for  variant    –  Chromosome      –  0-­‐based  start  posi2on  –  1-­‐based  end  posi2on  –  Strand  on  which  bases  are  reported  –  Reference  base  –  Alternate  base  –  sample  ID  (op2onal)  

 •  Transcript  Coordinates  

–  Unique  id  for  variant  –  Transcript  –  Amino  Acid  Subs2tu2on  –  sample  ID  (op2onal)  

 

List  of  muta2ons  from  tumor  sequencing  

RECO

MMEN

DED  

8  

Page 9: CRAVAT&and&MuPIT&Interac2ve:&& Web&Tools&for& Cancer ... · CRAVAT and muPIT: Web Services for High-Throughput Analysis of Sequence Variation in Cancer Author: Rachel Karchin Subject:

Map  to  transcripts  

•  Transcripts  are  scored  by  coverage  of  coding  bases  and  agreement  between  RefSeq  and  Ensembl  transcript  defini2ons.  

•  A  greedy  algorithm  selects  the  “best  transcript”  for  each  mutated  posi2on  

0.5570.5570.3780.378

0.7310.6850.7310.7310.7310.7310.6850.7310.6850.7310.7310.731

Scores

Douville  et  al.  CRAVAT:  Cancer-­‐Related  Analysis  of  VAriants  Toolkit.  2012    (submifed)    9  

Page 10: CRAVAT&and&MuPIT&Interac2ve:&& Web&Tools&for& Cancer ... · CRAVAT and muPIT: Web Services for High-Throughput Analysis of Sequence Variation in Cancer Author: Rachel Karchin Subject:

Iden2fy  known  variants  and  muta2ons  

COSMIC  ESP6500  1K  Genomes  dbSNP  

HUGO  symbol Transcript

Amino  acid  

posi<on

Amino  acid  

change dbSNP

1000  Genomes  allele  

frequency

ESP6500  allele  

frequency  (European  American)

ESP6500  allele  

frequency  (African  American)

Occurrences  in  

COSMIC  [exact  

nucleo<de  change]

Occurrences  in  COSMIC  by  primary  sites  [exact  nucleo<de  change]

RALYL NM_173848.5 270 M270I 0 0 0 1 upper_aerodiges<ve_tract(1) TRIM29 NM_012101.3 216 P216H 0 0 0 1 breast(1) ZNF317 NM_001190791.1 280 G280V 0 0 0 1 large_intes<ne(1) CD248 NM_020404.2 115 T115N rs140616335 0 0 0.0007 0 SRPX2 NM_014467.2 393 R393Q 0 0 0.0007 0 KDR NM_002253.2 539 G539R rs55716939 0 0.0022 0.0005 0 LGI2 NM_018176.3 322 R322H rs149130054 0.0005 0.0001 0 0 LILRB1 NM_006669.3 189 P189L 0 0.0001 0 0

10  

Page 11: CRAVAT&and&MuPIT&Interac2ve:&& Web&Tools&for& Cancer ... · CRAVAT and muPIT: Web Services for High-Throughput Analysis of Sequence Variation in Cancer Author: Rachel Karchin Subject:

Analysis   Predict  driver  vs.  random  muta2ons  

Carter  et  al..  Cancer-­‐specific  high-­‐throughput  annota2on  of  soma2c  muta2ons.  Cancer  Research.    2009      

Supervised machine learning!

Training Set

http://euvolution.com/futurist-transhuman-news-blog/

?

11  

Page 12: CRAVAT&and&MuPIT&Interac2ve:&& Web&Tools&for& Cancer ... · CRAVAT and muPIT: Web Services for High-Throughput Analysis of Sequence Variation in Cancer Author: Rachel Karchin Subject:

Where  do  the  decision  rules  come  from?  

http://euvolution.com/futurist-transhuman-news-blog/

amino  acid  subs2tu2on  proper2es  

human  polymorphism  proper2es  

vertebrate  ortholog  conserva2on  (genome  alignments)  

superfamily  homolog  conserva2on  (deep  protein  alignments)  

predicted  local  protein  structure  

regional  amino  acid  composi2on  

amino  acid  compa2bility  with  orthologs  and  homologs  

UniprotKB  features  

Wong  et  al..  CHASM  and  SNVBox  toolkit.  Bioinforma2cs.    2011      

86  features  pre-­‐computed  exome-­‐wide  in  SNVBox  

12  

Page 13: CRAVAT&and&MuPIT&Interac2ve:&& Web&Tools&for& Cancer ... · CRAVAT and muPIT: Web Services for High-Throughput Analysis of Sequence Variation in Cancer Author: Rachel Karchin Subject:

Where  does  the    training  set  come  from  ?  

Driver  missense  muta2ons:  •   Gene  harbors  at  least  5  muta2ons  •   Oncogene:  >15%  of  NS  muta2ons  occur  at  the  same  posi2on  •   Tumor  suppressor:  >  15%  of  NS  muta2ons  are  inac2va2ng  •   All  missense  muta2ons  in  these  genes  are  included  

Bozic et al PNAS 2010 Jones et al Science 2010 Vogelstein et al. submitted

Random  passenger  missense  muta2ons:  •   Generated  in  silico  with  a  genera2ve  model  •   Designed  to  match  tumor  2ssue  di-­‐nucleo2de        muta2on  spectrum  •   In  CHASM  feature  space,  they  do  not  look  like      high  allele  frequency  SNPs  

13  

Page 14: CRAVAT&and&MuPIT&Interac2ve:&& Web&Tools&for& Cancer ... · CRAVAT and muPIT: Web Services for High-Throughput Analysis of Sequence Variation in Cancer Author: Rachel Karchin Subject:

Tissue-­‐specific  classifiers  in  CRAVAT  

•  Select  from  pull  down  list  

•  Pre-­‐constructed  classifiers  for  tumor  types  under  analysis  by  TCGA  and  ICGC  

•  ‘Other’  offers  a  generic  classifier  if  2ssue  under  study  is  not  yet  supported  

14  

Page 15: CRAVAT&and&MuPIT&Interac2ve:&& Web&Tools&for& Cancer ... · CRAVAT and muPIT: Web Services for High-Throughput Analysis of Sequence Variation in Cancer Author: Rachel Karchin Subject:

Analysis  

•  Variant  Effect  Scoring  Tool  (VEST)  –  Iden2fy  func2onal  muta2ons  – Same  supervised  learning  algorithm  and  features  as  CHASM  

– Different  training  set  •  47000  disease  missense  muta2ons  from  HGMD1  2012v2  

•  45000  neutral  missense  variants  from  the  ESP65002  (AF>1%)  

Predict  func2onal  impact  of  muta2ons  

1  Stenson  P,  Mort  M,  Ball  E,  Howells  K,  Phillips  A,  Thomas  N,  Cooper  D,  et  al.:  The  human  gene  muta2on  database:  2008  update.  Genome  Med  2009,  1:13.  2  Exome  Variant  Server,  NHLBI  GO  Exome  Sequencing  Project  (ESP),  Seafle,  WA  [hfp://evs.gs.washington.edu/EVS/].  

15  

Page 16: CRAVAT&and&MuPIT&Interac2ve:&& Web&Tools&for& Cancer ... · CRAVAT and muPIT: Web Services for High-Throughput Analysis of Sequence Variation in Cancer Author: Rachel Karchin Subject:

Why  VEST?  !""#$%&'()#&*+'(%,$#

-*,)(%,'"#$%&'()##&*+'(%,$#

./012/#$%&'()##&*+'(%,$#

16  

Page 17: CRAVAT&and&MuPIT&Interac2ve:&& Web&Tools&for& Cancer ... · CRAVAT and muPIT: Web Services for High-Throughput Analysis of Sequence Variation in Cancer Author: Rachel Karchin Subject:

!""#$%&'()#&*+'(%,$#

-*,)(%,'"#$%&'()##&*+'(%,$#

./012/#$%&'()##&*+'(%,$#

!""#$%&'()#&*+'(%,$#

-*,)(%,'"#$%&'()##&*+'(%,$#

./012/#$%&'()##&*+'(%,$#

Func2onal  (Damaging)  

Benign  

  17  

Page 18: CRAVAT&and&MuPIT&Interac2ve:&& Web&Tools&for& Cancer ... · CRAVAT and muPIT: Web Services for High-Throughput Analysis of Sequence Variation in Cancer Author: Rachel Karchin Subject:

Why  VEST?  

Func2onal  (Damaging)  

Benign  

Driver  vs.  Passenger  and  Damaging  vs.  Benign  classifiers  provide  complementary  benefits  for  muta2on  analysis.    

18  

Page 19: CRAVAT&and&MuPIT&Interac2ve:&& Web&Tools&for& Cancer ... · CRAVAT and muPIT: Web Services for High-Throughput Analysis of Sequence Variation in Cancer Author: Rachel Karchin Subject:

Analysis  

•  Gene  annota2ons  –  PubMed  – GeneCards  

•  Gene  scores  –  Combined  P-­‐values  of  VEST  or  CHASM  scores  (Stouffer’s)  

Find  significantly  mutated  genes  and  pathways  

19  

Page 20: CRAVAT&and&MuPIT&Interac2ve:&& Web&Tools&for& Cancer ... · CRAVAT and muPIT: Web Services for High-Throughput Analysis of Sequence Variation in Cancer Author: Rachel Karchin Subject:

Submit  

20  

Page 21: CRAVAT&and&MuPIT&Interac2ve:&& Web&Tools&for& Cancer ... · CRAVAT and muPIT: Web Services for High-Throughput Analysis of Sequence Variation in Cancer Author: Rachel Karchin Subject:

Job  completed:  Email  No2fica2on  

21  

Page 22: CRAVAT&and&MuPIT&Interac2ve:&& Web&Tools&for& Cancer ... · CRAVAT and muPIT: Web Services for High-Throughput Analysis of Sequence Variation in Cancer Author: Rachel Karchin Subject:

CRAVAT  Results  

 If  MuPIT  Interac2ve  input  file  is  requested  •  File  with  muta2ons  formafed  for  input  to  MuPIT  Interac2ve   22  

Page 23: CRAVAT&and&MuPIT&Interac2ve:&& Web&Tools&for& Cancer ... · CRAVAT and muPIT: Web Services for High-Throughput Analysis of Sequence Variation in Cancer Author: Rachel Karchin Subject:

muPIT  Interac2ve  mupit.icm.jhu.edu  

23  

Input  genomic  coordinates  of  muta2ons  of  interest  

Page 24: CRAVAT&and&MuPIT&Interac2ve:&& Web&Tools&for& Cancer ... · CRAVAT and muPIT: Web Services for High-Throughput Analysis of Sequence Variation in Cancer Author: Rachel Karchin Subject:

Table  of  protein  structures  onto  which  your  muta2ons  can  be  mapped  

Posi2on-­‐specific  annota2ons  available  for  each  structure  can  be  used  to  select  most  interes2ng  structures.  

A  subset  of  muta2ons  did  not    map  onto  protein  structures  

24  

Page 25: CRAVAT&and&MuPIT&Interac2ve:&& Web&Tools&for& Cancer ... · CRAVAT and muPIT: Web Services for High-Throughput Analysis of Sequence Variation in Cancer Author: Rachel Karchin Subject:

Applet  supports  rota2on  and  zoom  

Easy  to  use  interface  for  customized  figures    

Your  muta2ons  

Selec2ng  a  structure  brings  you  to  a  details  page  for  interac2ve  viewing  

Annota2ons  from  UniProt  feature  table  

25  

Page 26: CRAVAT&and&MuPIT&Interac2ve:&& Web&Tools&for& Cancer ... · CRAVAT and muPIT: Web Services for High-Throughput Analysis of Sequence Variation in Cancer Author: Rachel Karchin Subject:

Firehose  breast  cancer  muta2ons  

26  

Five  muta2ons  cluster  together  on  a  structure  of  RUNX1  (in  complex  with  CBFbeta)    

Page 27: CRAVAT&and&MuPIT&Interac2ve:&& Web&Tools&for& Cancer ... · CRAVAT and muPIT: Web Services for High-Throughput Analysis of Sequence Variation in Cancer Author: Rachel Karchin Subject:

Zoom-­‐in  for  view  of  the  muta2ons  (green)  and  a  func2onally  annotated  region  (red)  

Displayed  Controls  27  

Page 28: CRAVAT&and&MuPIT&Interac2ve:&& Web&Tools&for& Cancer ... · CRAVAT and muPIT: Web Services for High-Throughput Analysis of Sequence Variation in Cancer Author: Rachel Karchin Subject:

Func2onally  annotated  region  with    text  display  

28  

Page 29: CRAVAT&and&MuPIT&Interac2ve:&& Web&Tools&for& Cancer ... · CRAVAT and muPIT: Web Services for High-Throughput Analysis of Sequence Variation in Cancer Author: Rachel Karchin Subject:

29  

Page 30: CRAVAT&and&MuPIT&Interac2ve:&& Web&Tools&for& Cancer ... · CRAVAT and muPIT: Web Services for High-Throughput Analysis of Sequence Variation in Cancer Author: Rachel Karchin Subject:

Future  Func2onality  

•  Integra2on  of  CRAVAT  and  MuPIT  into  a  single  end-­‐to-­‐end  service  

•  GANT  -­‐  a  new  sta2s2c  to  find  significantly  mutated  genes  and  pathways  

•  Exhaus2ve  mapping  of  genomic  coordinates  onto  all  protein  coding  transcripts  defined  at  a  locus  

 

30  

Page 31: CRAVAT&and&MuPIT&Interac2ve:&& Web&Tools&for& Cancer ... · CRAVAT and muPIT: Web Services for High-Throughput Analysis of Sequence Variation in Cancer Author: Rachel Karchin Subject:

External  integra2ons  

Coming  in  2013  

31  

Page 32: CRAVAT&and&MuPIT&Interac2ve:&& Web&Tools&for& Cancer ... · CRAVAT and muPIT: Web Services for High-Throughput Analysis of Sequence Variation in Cancer Author: Rachel Karchin Subject:

Thank  you  for  listening!  

Please  visit  Poster  #75  to  talk  with  us  about  CRAVAT  and  MuPIT  

32  

Please  visit  Poster  #80  to  learn  about  the  Intogen-­‐SM  analysis  pipeline    from  Lopez-­‐Bigas  Lab  (Spain)    

Page 33: CRAVAT&and&MuPIT&Interac2ve:&& Web&Tools&for& Cancer ... · CRAVAT and muPIT: Web Services for High-Throughput Analysis of Sequence Variation in Cancer Author: Rachel Karchin Subject:

NIH R21 CA135866 NIH R21 CA152432

NIH 3U24CA143858-­-2S1 NSF CAREER DBI 0845275

Acknowledgments  

33