phytome a data analysis pipline presented by jason phillips
Post on 19-Dec-2015
222 views
TRANSCRIPT
Phytome
A Data Analysis Piplinepresented byJason Phillips
High Level Flow Chart
Retrieve UnigenesRetrieve Unigenes
Translate UnigenesTranslate Unigenes
FamiliesFamilies
Main Outline
● Unigenes (Where'd they come from, where'd they go?)
● Translation (methods and procedures)
● Building Families (the power of together-ness)
phytome » Unigene
● What are?● Where from?● Nine Species● Arabidopsis, a special case● Storage
phytome » Unigene » What Are?Combined EST's that overlap
phytome » Unigene » Where From?
● TIGR● Other sources?
phytome » Unigene » Nine Species
phytome » Unigene » Arabidopsis
Highly annotated...Highly sequenced...Highly translated...
phytome » Unigene » Storage
species count-------------------ghir 24350mcry 8455osat 60778hann 20520mtru 36976lesc 31012ljap 11025lsat 21960atha 27170-------------------total: 242246
phytome » Translation
● Methods● Estwise● Estscan● FrameFinder
● Procedure● Numbers
phytome » Translation » methods
EST-WISE ESTSCANFRAMEFINDER
AB INITIOHOMOLOGIES via BLAST
sprot + trembl
phytome » Translation » procedure
● EST-WISE (Mac OSX Cluster)– blast swiss prot: 10.3 hours, 35 nodes (~15 days)– blast trembl: 35.7 hours, 35 nodes (~52 days)
● ESTSCAN (Mustard)● FrameFinder (Mustard)
phytome » Translation » numbers
242,246Unigenes
242,246Unigenes
ESTWISE
FRAMEFINDER
ESTSCAN
151,830
226,988
242,242
90,416
15,258
4
phytome » Families
● Relationships● Clustering● Numbers
phytome » Families » RelationshipsBlast everything against everything
sequences blastable dbof sequences
query sbjct e-value------- -------- -----------mtru302 ljap4523 1 29mtru302 lesc25072 1 26mtru302 hann20270 5 24osat59606 osat59606 1 157osat59606 osat4002 1 96osat59606 atha25166 1 88...... ..... . ........ ..... . ..
phytome » Families » RelationshipsBut we have 4 set's of sequences!
tblastx242,246
nucleotides
blastp151,830
estwise
blastp226,988
estscan
blastp242,242
framefinder
Which method do we trust?
phytome » Families » Relationships4 data sets...4 family interpretations
tb
ew
es
ff
~3 days, 28 nodes (~84 days)
~1/4 day, 21 nodes (~5days)
~1/4 day, 21 nodes (~5 days)
~1/4 day, 21 nodes (~5 days)
BLAST OFF!
phytome » Families » Relationships
Method size no blast no trans attrition------ -------- -------- -------- ----------tb 242246 153 0 153ew 151830 22 90416 90438ff 242242 24563 4 24567 es 226988 1345 15258 16603
BLAST RESULTS
phytome » Families » Clustering
TRIBE MCL
evalue
gene
phytome » Families » Clustering
TRIBE MCL
evalue
gene
phytome » Families » Clustering
fam id member------ ------.... ........... .......4035 atha74994035 atha75034035 atha84834036 atha107044036 osat230814036 osat366674037 atha10724037 atha50594037 lsat154214037 lsat21190.... ........... ......
query sbjct evalue-------- -------- ------atha7499 atha8483 6 78atha7499 atha7503 4 90osat23081 atha10704 8 78osat23081 osat36667 8 78atha1072 atha5059 2 68atha1072 lsat15421 2 60atha1072 lsat21190 1 102atha1072 atha5059 9 54...... ...... . ........ ...... . ........ ...... . ..
tribe mcl
phytome » Families » Clustering
tb
ff
es
ew
tb
ff
es
ewTRIBE MCL
blast results families
phytome » Families » Clustering
Let's look as some histograms!
What should we do next round?