phytome a data analysis pipline presented by jason phillips

24
Phytome A Data Analysis Pipline presented by Jason Phillips

Post on 19-Dec-2015

222 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Phytome A Data Analysis Pipline presented by Jason Phillips

Phytome

A Data Analysis Piplinepresented byJason Phillips

Page 2: Phytome A Data Analysis Pipline presented by Jason Phillips

High Level Flow Chart

Retrieve UnigenesRetrieve Unigenes

Translate UnigenesTranslate Unigenes

FamiliesFamilies

Page 3: Phytome A Data Analysis Pipline presented by Jason Phillips

Main Outline

● Unigenes (Where'd they come from, where'd they go?)

● Translation (methods and procedures)

● Building Families (the power of together-ness)

Page 4: Phytome A Data Analysis Pipline presented by Jason Phillips

phytome » Unigene

● What are?● Where from?● Nine Species● Arabidopsis, a special case● Storage

Page 5: Phytome A Data Analysis Pipline presented by Jason Phillips

phytome » Unigene » What Are?Combined EST's that overlap

Page 6: Phytome A Data Analysis Pipline presented by Jason Phillips

phytome » Unigene » Where From?

● TIGR● Other sources?

Page 7: Phytome A Data Analysis Pipline presented by Jason Phillips

phytome » Unigene » Nine Species

Page 8: Phytome A Data Analysis Pipline presented by Jason Phillips

phytome » Unigene » Arabidopsis

Highly annotated...Highly sequenced...Highly translated...

Page 9: Phytome A Data Analysis Pipline presented by Jason Phillips

phytome » Unigene » Storage

species count-------------------ghir 24350mcry 8455osat 60778hann 20520mtru 36976lesc 31012ljap 11025lsat 21960atha 27170-------------------total: 242246

Page 10: Phytome A Data Analysis Pipline presented by Jason Phillips

phytome » Translation

● Methods● Estwise● Estscan● FrameFinder

● Procedure● Numbers

Page 11: Phytome A Data Analysis Pipline presented by Jason Phillips

phytome » Translation » methods

EST-WISE ESTSCANFRAMEFINDER

AB INITIOHOMOLOGIES via BLAST

sprot + trembl

Page 12: Phytome A Data Analysis Pipline presented by Jason Phillips

phytome » Translation » procedure

● EST-WISE (Mac OSX Cluster)– blast swiss prot: 10.3 hours, 35 nodes (~15 days)– blast trembl: 35.7 hours, 35 nodes (~52 days)

● ESTSCAN (Mustard)● FrameFinder (Mustard)

Page 13: Phytome A Data Analysis Pipline presented by Jason Phillips

phytome » Translation » numbers

242,246Unigenes

242,246Unigenes

ESTWISE

FRAMEFINDER

ESTSCAN

151,830

226,988

242,242

90,416

15,258

4

Page 14: Phytome A Data Analysis Pipline presented by Jason Phillips

phytome » Families

● Relationships● Clustering● Numbers

Page 15: Phytome A Data Analysis Pipline presented by Jason Phillips

phytome » Families » RelationshipsBlast everything against everything

sequences blastable dbof sequences

query sbjct e-value------- -------- -----------mtru302 ljap4523 1 29mtru302 lesc25072 1 26mtru302 hann20270 5 24osat59606 osat59606 1 157osat59606 osat4002 1 96osat59606 atha25166 1 88...... ..... . ........ ..... . ..

Page 16: Phytome A Data Analysis Pipline presented by Jason Phillips

phytome » Families » RelationshipsBut we have 4 set's of sequences!

tblastx242,246

nucleotides

blastp151,830

estwise

blastp226,988

estscan

blastp242,242

framefinder

Which method do we trust?

Page 17: Phytome A Data Analysis Pipline presented by Jason Phillips

phytome » Families » Relationships4 data sets...4 family interpretations

tb

ew

es

ff

~3 days, 28 nodes (~84 days)

~1/4 day, 21 nodes (~5days)

~1/4 day, 21 nodes (~5 days)

~1/4 day, 21 nodes (~5 days)

BLAST OFF!

Page 18: Phytome A Data Analysis Pipline presented by Jason Phillips

phytome » Families » Relationships

Method size no blast no trans attrition------ -------- -------- -------- ----------tb 242246 153 0 153ew 151830 22 90416 90438ff 242242 24563 4 24567 es 226988 1345 15258 16603

BLAST RESULTS

Page 19: Phytome A Data Analysis Pipline presented by Jason Phillips

phytome » Families » Clustering

TRIBE MCL

evalue

gene

Page 20: Phytome A Data Analysis Pipline presented by Jason Phillips

phytome » Families » Clustering

TRIBE MCL

evalue

gene

Page 21: Phytome A Data Analysis Pipline presented by Jason Phillips

phytome » Families » Clustering

fam id member------ ------.... ........... .......4035 atha74994035 atha75034035 atha84834036 atha107044036 osat230814036 osat366674037 atha10724037 atha50594037 lsat154214037 lsat21190.... ........... ......

query sbjct evalue-------- -------- ------atha7499 atha8483 6 78atha7499 atha7503 4 90osat23081 atha10704 8 78osat23081 osat36667 8 78atha1072 atha5059 2 68atha1072 lsat15421 2 60atha1072 lsat21190 1 102atha1072 atha5059 9 54...... ...... . ........ ...... . ........ ...... . ..

tribe mcl

Page 22: Phytome A Data Analysis Pipline presented by Jason Phillips

phytome » Families » Clustering

tb

ff

es

ew

tb

ff

es

ewTRIBE MCL

blast results families

Page 23: Phytome A Data Analysis Pipline presented by Jason Phillips

phytome » Families » Clustering

Let's look as some histograms!

Page 24: Phytome A Data Analysis Pipline presented by Jason Phillips

What should we do next round?