robots and automatic genome annotation

of 58 /58
Robots and Automatic Genome Annotation Ross D. King Department of Computer Science University of Wales, Aberystwyth

Author: kaethe

Post on 13-Jan-2016

23 views

Category:

Documents


0 download

Embed Size (px)

DESCRIPTION

Robots and Automatic Genome Annotation. Ross D. King Department of Computer Science University of Wales, Aberystwyth. Talk Plan. Data Mining based gene function prediction The Robot Scientist Automating annotation and experimentation. Data Mining Prediction. - PowerPoint PPT Presentation

TRANSCRIPT

  • Robots and Automatic Genome Annotation

    Ross D. KingDepartment of Computer ScienceUniversity of Wales, Aberystwyth

  • Talk PlanData Mining based gene function prediction

    The Robot Scientist

    Automating annotation and experimentation

  • Data Mining PredictionWe have developed a method for predicting the functional class of gene products based on data mining.The idea is to learn a reliable predictive function on the examples of genes with products of known function.Then apply this function to genes where the functional class is unknown.Applied to: E. coli, M. tuberculosis, S. cerevisiae, A. thaliana.We call this approach: Data Mining Prediction (DMP).

  • Classification schemes (MIPS/GO)1,0,0,0 "METABOLISM"1,1,0,0 "amino acid metabolism"1,1,1,0 "amino acid biosynthesis"1,1,4,0 "regulation of amino acid metabolism"1,1,7,0 "amino acid transport"1,1,10,0 "amino acid degradation (catabolism)"1,1,99,0 "other amino acid metabolism activities"

    1,2,0,0 "nitrogen and sulfur metabolism"1,3,0,0 "nucleotide metabolism"1,4,0,0 "phosphate metabolism"1,5,0,0 "C-compound and carbohydrate metabolism"1,6,0,0 "lipid, fatty-acid and isoprenoid metabolism"1,7,0,0 "metabolism of vitamins, cofactors, and prosthetic groups"1,20,0,0 "secondary metabolism"

    ... and ORFs may have multiple functions too!Hierarchy of classes

  • Sequence Data478 attributes in totalfielddescriptiontypeaa_rat_X% of amino acid X in the proteinrealseq_lenlength of the protein sequenceintaa_rat_pair_X_Y% of the amino acids X and Y consecutivelyrealmol_wtmolecular weight of the proteininttheo_pItheoretical pI (isoelectric point)realatomic_comp_Xatomic composition of X (C,H,N,O,S)realaliphatic_indexaliphatic indexrealhydrogrand average of hydropathyrealstrandthe DNA strand'w' or 'c'positionthe number of exons (no. of start positions)intcaicodon adaptation indexrealmotifsnumber of PROSITE motifsinttmSpansnumber of transmembrane spansintchromosomechromosome number1..16,mit

  • Homology dataYAL001C: mvltiypdelvqivsdkiasnkgkitlnqlwdisgkyfdlsdk....sfc3:keyword(membrane)length(358)dbref(prosite)dbref(embl)We look up the associated information from SwissProt

  • Predicted Secondary Structure Datamvltiypdelvqivsdkiasnkgkitlnqlwdisgkyfdlsdkkvk...cbbbbccaaaaaaaaaaaacccccbbbbaaaaaacccbbccccccb...We record length and relative positions of the secondary structure elements.

    This is relational data.

  • Expression DataSpellman et al (1998), Roth et al (1998)DeRisi et al (1997), Eisen et al (1998)Gasch et al (2000, 2001), Chu et al (1998)Microrarray experiments to measure expression changes in yeast under a variety of conditions, including cell cycle, heat shock, diauxic shift.Short time series data, numerical-valued

  • Phenotype DataData from knockout gene growth experiments Many missing dataData taken from 3 sources (TRIPLES, MIPS, EUROFAN)s = sensitive (less growth)w = wild-type (no observable effect) r = resistant (more growth)n = no dataORF

    YAL001CYAL019WYAL021CYAL029Ccalcofluor white w n n nsorbitol

    n s n wbenomyl

    n w n w...deleted ORFgrowth medium H2O2

    w w n r

  • What are the Machine Learning Issues? Large volume of data Missing data Accurate results required Intelligible results required Class hierarchy Multiple labels Relational data

  • Data Mining Prediction (DMP)Entire databaseData for rule creation2/31/32/31/3PolyFARMC4.5Rule gener-ationSelectbestrulesMeasureruleaccuracyValidation dataTrainingdataAllrulesBestrulesTest dataResults

  • Application to Bacterial GenomesSuccessful for both M. tuberculosis and E. coli.Of the ORFs with no assigned function >40% were predicted to have a function at one or more levels of the class hierarchy. It was found that many of the predictive rules were more general than possible using sequence homology. ReferencesKing et al. (2000) KDD 2000King et al. (2000) Yeast (Comparative and Functional Genomics)King et al. (2001) Bioinformatics

  • Summary Results (Bacteria)Using voting (2 or more rules agree on a prediction)Level 2 :128 ORFs predicted - 87.5% accuracyLevel 3 : 23 ORFs predicted - 91.3% accuracy

    All predictionsLevel 2 :335 ORFs predicted - 64.5% accuracyLevel 3: 204 ORFs predicted - 44.6% accuracy

  • Example Rule (level 2 E. coli) If the ORF is not predicted to have a b-strand of length 3 a homologous protein from class Chytridiomycetes was foundThen its functional class is Cell processes, Transport/binding proteins

    12/13 (86%) correct on Test Set - probability of this result occurring by chance is estimated at 4x10-7. 24 ORFs of unknown function are predicted by the rule.

    16 ORFs now with putative or confirmed function - 93.8% accurate predictions

  • Experimental ConformationThe original bacterial ORF predictions were made over three years ago. In the intervening time many more ORFs have been sequenced, making traditional homologous prediction methods more accurate and sensitive, and the function of some ORFs have been determined by wet biology.The E. coli genome has recently been re-annotated by Monica Rileys group.

  • Wet Biology conformationA number of predictions have been confirmed or falsified by new wet experimental data.

    This new data is biased towards hard classes. Despite this the results are still good:Level 2: 23 predictions - 47.8% accuracyLevel 3: 23 predictions - 43.4% accuracy

    This is very much better than random as there are many classes.

  • Confirmation of Wet Predictions

    ORF

    Rule

    Predicted Class

    Confirmed Function

    Result

    b0805

    8

    Cell envelop

    Outer membrane protein

    C

    b1519

    15

    Degradation of small molecules

    Trans-aconitate methyltransferase

    C

    b1533

    43

    Transport/binding proteins

    Cysteine pathway metabolite transport

    C

    b1981

    42

    Transport/binding proteins

    Shikimate and dehydroshikimate transport protein

    C

    b1981

    56

    Transport/binding proteins

    Shikimate and dehydroshikimate transport protein

    C

    b2210

    15

    Degradation of small molecules

    Malate:quinone oxidoreductase

    C

    b2392

    43a

    Transport/binding proteins

    High-affinity manganese transporter

    C

    b2392

    43b

    Transport/binding proteins

    High-affinity manganese transporter

    C

    b2392

    54

    Transport/binding proteins

    High-affinity manganese transporter

    C

    b2924

    45

    Transport/binding proteins

    Component of the MscS mechanosensitive channel new gene family

    C

    b3839

    43

    Transport/binding proteins

    Essential component of translocase

    C

    b0103

    42

    Transport/binding proteins

    dephospho-CoA kinase

    W

    b0103

    41

    Transport/binding proteins

    dephospho-CoA kinase

    W

    b0103

    43

    Transport/binding proteins

    dephospho-CoA kinase

    W

    b1822

    15

    Degradation of small molecules

    23S rRNA m1G745 methyltransferase

    W

    b2530

    35

    Global regulatory functions

    cysteine desulfurase

    W

    b2392

    14

    Degradation of small molecules

    High-affinity manganese transporter

    W

    b2889

    50

    Energy metabolism carbon

    Isopentenyl diphosphate isomerase

    W

    b3222

    54

    Transport/binding proteins

    ManNAc kinase

    W

    b3223

    39

    Ribosome constituents

    ManNAc epimerase

    W

    b3337

    28

    Laterally acquired elements

    regulatory or redox component

    W

    b3338

    39

    Ribosome constituents

    Periplasmic endochitinase

    W

    b3569

    32

    Laterally acquired elements

    transcriptional regulator of xylose utilization

    W

    b3955

    8

    Cell envelop

    Required for invasion of brain microvascular endothelial cells

    EF

    b3955

    18

    Energy metabolism carbon

    Required for invasion of brain microvascular endothelial cells

    EA

    b3955

    20

    Energy metabolism carbon

    Required for invasion of brain microvascular endothelial cells

    EA

  • Results (Yeast)Many rules from each data typeRules at each level of hierarchySome classes are much easier to predict than others (for example "protein synthesis" at 71-93%, "energy" at 20-47%)Good levels of accuracy on held out test dataMany predictions for ORFs of unknown function (some function at some level is predicted for 96% of the ORFs of unknown function)Some rules explainable by biology -> scientific knowledge discoveryClare & King (2003) Bioinformatics suppl. 2., 42-49

  • Accuracy Table

    Level

    Datatype

    1

    2

    3

    4

    all

    Seq

    55

    55

    33

    0

    71

    Struc

    49

    43

    0

    0

    58

    Hom

    65

    38

    69

    20

    55

    Expr

    42

    37

    35

    0

    75

    Phen

    75

    40

    7

    0

    68

  • Extension to Arabidopsis GenomeCollaborative project with the Institute of Grassland and Environmental Research and the University of Nottingham.Large increase in data: 6,000 -> 25,000 ORFs. Large amount of micro-array data from the Nottingham Arabidopsis stock centre. 250 million Prolog facts, 200,000 attributes, File sizes almost 2Gb 7,964 gene function predictions with an expected accuracy >70%, 2,974 with an expected accuracy >90%, We are currently growing 14 knockout varieties of Arabidopsis to test a sample of these predictions

  • AvailabilityAll rules and data available at http://www.aber.ac.uk/compsci/Research/bio/dss/All predictions available at http://www.genepredictions.org

  • The Robots Scientist

  • The Robot Scientist ConceptBackground KnowledgeMachine LearningAnalysisConsistentHypothesisFinal TheoryExperiment(s) selectionRobotExperiments(s)ResultsThe robot scientist project aims to develop a computer system that is capable of originating its own experiments, physically doing them, interpreting the results, and then repeating the cycle.

  • Motivation: TechnologicalIn many areas of science our ability to generate data is outstripping our ability to analyse the data.

    One scientific area where this is true is functional genomics, where data is now being generated on an industrial scale.

    The analysis of scientific data needs to become as industrialised as its generation.

  • The Application DomainFunctional genomicsIn yeast (S. cerivasae) ~30% of the 6,000 genes still have no known function.EUROFAN 2 has knocked out each of the 6,000 genes in mutant strains.Task to determine the function of the gene by auxotrophic growth experiments comparing mutants and wild type.

  • Logical Cell ModelWe have built a logical model of the known metabolic pathways (coded in Prolog) - taken from KEGG and other bioinformatic sources. This is essentially a directed graph: with metabolites as nodes and enzymes as arcs.

    If a path can be found from cell inputs (metabolites in the growth medium) to all the cell outputs (essential compounds), then the cell can grow.

  • AAA Model SystemWe started using the aromatic amino-acid (AAA) pathway in yeast as a model system to prove the principle of the Robot Scientist.

    9 metabolities can be used of the shelf15 knockout mutants from Eurofan

    The mutant can grow iff all three aromatic amino-acids can be synthesised (tyrosine, phenyalalanine, tryptophan). Based on a pathway from glycerate-2-phophate.

  • Glycerate-2-PhosphatePhosphoenolpyruvateD-Erythrose-4-Phosphate3-deoxy-D-arabino-heptulosonate-7-phosphate3-Dehydroquinate3-Dehydroshikimate5-DehydroshikimateShikimateShikimate 3-phosphate5-o-1-carboxyvinyl-3-phosphoshikimateChorismatePrephenatep-HydroxyphenylpyruvateTYROSINEPhenylpyruvatePHENYLALANINEAnthranilateTRYPTOPHANN-5-Phospho--d-ribosylanthranilate1-(2-Carboxylphenylamino)-1-deoxy-D-ribulose-5-phosphate(3-Indolyl)-glycerolphosphateIndoleYBR249CYDR035WYGR254WYHR174WYMR323W

    YDR127WYDR127WYDR127WYDR127WYDR127WYDR127WYPR060CYBR166CYHR137WYGL202WYNL316CYGL148WYDR354WYDR007WYKL211CYGL026CYGL026CYGL026CYER090W(YKL211C)C00631C00074C00279C04961C00944C02637C02652C00493C03175C01269C00251C00254C01179C00166C03506C01302C00108C04302C00463C00078C00079C00082YHR137WYGL202WPhenyalanine, Tyrosine, and Tryptophan Pathways for S. cerivisaeGrowth MediumMetabolite import

  • Experimental MethodologyExperiments consist of making particular growth media and testing if the mutants can grow (add metabolites to a basic defined medium).

    A mutant is auxotrophic if cannot grow on a defined medium that the wild type can grow on.

    By observing the pattern of chemicals that recover growth the function of the knocked out mutant can be inferred.

  • Inferring HypothesesIn the philosophy of science. It has often been argued that only humans can make the leaps of imagination necessary to form hypotheses.

    We use Abductive Logic Programming to infer missing arcs/labels in our metabolic graph. With these missing nodes we can explain (deductively) all the experimental results.

    Reiser et al., (2001) ETAI 5, 233-244;

  • The Form of the HypothesesThe form of the hypotheses we can infer is currently quite simple. Each hypothesis binds a particular gene to an enzyme that catalyses the reaction.A correct hypothesis would be that: YDR060C codes for the enzyme for the reaction chorismate prephenate.An incorrect hypothesis would be that: it coded for the reaction chorismate anthranilate.We have also demonstrated how more complex abductive hypotheses could be formed.

  • A Discriminating ExperimentHypothesis 1: YDR060C codes for the enzyme the reaction: chorismate prephenate.Hypothesis 2: YDR060C codes for the enzyme the reaction: chorismate anthranilate.

    These can be distinguished by growing the knockout YDR060C on prephenate or anthranilate. Note that these two experiments will have differing monetary cost.

  • Glycerate-2-PhosphatePhosphoenolpyruvateD-Erythrose-4-Phosphate3-deoxy-D-arabino-heptulosonate-7-phosphate3-Dehydroquinate3-Dehydroshikimate5-DehydroshikimateShikimateShikimate 3-phosphate5-o-1-carboxyvinyl-3-phosphoshikimateChorismatePrephenatep-HydroxyphenylpyruvateTYROSINEPhenylpyruvatePHENYLALANINEAnthranilateTRYPTOPHANN-5-Phospho--d-ribosylanthranilate1-(2-Carboxylphenylamino)-1-deoxy-D-ribulose-5-phosphate(3-Indolyl)-glycerolphosphateIndoleYBR249CYDR035WYGR254WYHR174WYMR323W

    YDR127WYDR127WYDR127WYDR127WYDR127WYDR127WYPR060CYBR166CYHR137WYGL202WYNL316CYGL148WYDR354WYDR007WYKL211CYGL026CYGL026CYGL026CYER090W(YKL211C)C00631C00074C00279C04961C00944C02637C02652C00493C03175C01269C00251C00254C01179C00166C03506C01302C00108C04302C00463C00078C00079C00082YHR137WYGL202WPhenyalanine, Tyrosine, and Tryptophan Pathways for S. cerivisaeGrowth MediumMetabolite import

  • Inferring ExperimentsGiven a set of hypotheses we wish to infer an experiment that will efficiently discriminate between them

    Assume:Every experiment has an associated cost.Each hypothesis has a probability of being correct.

    The task:To choose a series of experiments which minimise the expected cost of eliminating all but one hypothesis.

  • Comparison of different experimental strategies

    ASE - Expected cost minimization.

    Nave - Choose cheapest experiment.

    Random - Randomly choose experiments.

    The cost of a series of experiment is a function of the time taken and money spent. Time is Money.

  • The RobotBiomek 200

  • Closing the LoopWe have physically implemented all aspects of the Robot Scientist system.

    To the best of our knowledge this is the first active learning system that both explicitly forms hypotheses and experiments, and physicals does real experiments.

  • Accuracy v TimeAt the end of the 5th iteration: ASE 80.1%, Nave 74.0%, Random 72.2%. ASE was significantly more accurate than either Nave (p < 0.05) or Random (p < 0.07) using a paired t-test.

    RS Accuracy

    57.358157.358157.3581

    67.17187567.34892187567.265625

    76.13599687572.585412568.550709375

    79.54475312571.577687573.828478125

    80.4737687573.03236562573.886753125

    80.10918437572.164312573.93518125

    ase

    random

    naive

    Iterations

    Classification Accuracy (%)

    RS cost vs accuracy

    57.358157.358157.3581

    67.17187567.34892187567.265625

    76.13599687572.585412568.550709375

    79.54475312571.577687573.828478125

    80.4737687573.03236562573.886753125

    80.10918437572.164312573.93518125

    ase

    random

    naive

    Log 10 Cost ()

    Classification Accuracy (%)

    results

    AseRandomNaiveday 1

    GeneErrorLoopTechniqueIterationCostAccuracyGeneErrorLoopTechniqueIterationCostAccuracyGeneErrorLoopTechniqueIterationCostAccuracy

    AseTests (individual Gene)Ave Accuracy

    YBR166C1ase11069.3827YBR166C1random124781.3333YBR166C1naive11069.3827Iterationave accuracyave costLog ave costAseRandomNaive

    YDR007W1ase11074.321YDR007W1random18179.4444YDR007W1naive11074.321057.358100Gene

    YDR035W1ase11028.8889YDR035W1random16253.4503YDR035W1naive11028.8889167.171875101T-TestsCostYBR166C69.382772.23822569.3827

    YDR354W1ase11074.321YDR354W1random124776.4646YDR354W1naive11074.321276.13599687557.68751.7610817184YDR007W71.67642575.29172571.676425

    YER090W1ase11059.7531YER090W1random110463.1579YER090W1naive11059.7531379.544753125184.031252.264891576Iterationase|randomase|naverandom|naveYDR035W47.3099545.08772548.05995

    YGL026C1ase11079.2593YGL026C1random16281.1111YGL026C1naive11079.2593480.47376875255.6252.4076033254YDR354W74.32176.118174.321

    YKL211C1ase11063.7427YKL211C1random124163.7427YKL211C1naive11063.7427580.109184375313.81252.496670238710.05645075200.056450752YER090W59.753160.35167559.7531

    YNL316C1ase11069.3827YNL316C1random120569.3827YNL316C1naive11069.382720.00038079300.0003592261YGL026C79.259373.34077579.2593

    YBR166C2ase11069.3827YBR166C2random118978.5185YBR166C2naive11069.382730.00002549460.00000000020.0000193599YKL211C71.67642567.11257571.676425

    YDR007W2ase11063.7427YDR007W2random182263.7427YDR007W2naive11063.742740.00000066860.00000000180.0000004905YNL316C63.996169.25057563.9961

    YDR035W2ase11053.4503YDR035W2random16253.4503YDR035W2naive11053.4503Random50.00000004080.0000000030.0000000303

    YDR354W2ase11074.321YDR354W2random1942776.4646YDR354W2naive11074.321Iterationave accuracyave costLog ave cost

    YER090W2ase11059.7531YER090W2random1942767.9798YER090W2naive11059.7531057.358100All0.00000000000.00000000020.0000000000

    YGL026C2ase11079.2593YGL026C2random124180.4444YGL026C2naive11079.2593167.348921875805.468752.9060486956day 5AccuracyAccuracy/Cost compares ase at day 3, random at day 0 and naive at day 5Accuracy day 3Accuracy day 4

    YKL211C2ase11074.321YKL211C2random110463.7427YKL211C2naive11074.321272.58541253607.8753.5572514824

    YNL316C2ase11069.3827YNL316C2random122464.5455YNL316C2naive11069.3827371.57768755641.656253.7514066208Tests (individual Gene)

    YBR166C3ase11069.3827YBR166C3random18164.5556YBR166C3naive11069.3827473.0323656257381.93753.8681703639T-TestsAccuracyAseRandomNaiveAse/RandomAse/NaiveRandom/NaiveAse (3)Random (0)naive (5)Ase/RandomAse/NaiveRandom/NaiveAse (3)Random (0)naive (5)Ase/RandomAse/NaiveRandom/NaiveAse (3)Random (0)naive (5)Ase/RandomAse/NaiveRandom/Naive

    YDR007W3ase11074.321YDR007W3random12976.8687YDR007W3naive11074.321572.16431259021.93753.9552998142Gene

    YDR035W3ase11053.4503YDR035W3random16253.4503YDR035W3naive11053.4503Iterationase|randomase|naverandom|naveYBR166C86.4444581.1018565.6481255.342620.79632515.453725YBR166C79.01232546.198865.64812532.81352513.3642-19.449325YBR166C79.01232581.1018569.3827-2.0895259.62962511.71915YBR166C86.4444581.1018565.6481255.342620.79632515.453725

    YDR354W3ase11074.321YDR354W3random16277.2222YDR354W3naive11074.321YDR007W97.08332569.53702584.83022527.546312.2531-15.2932YDR007W97.08332563.742784.83022533.34062512.2531-21.087525YDR007W97.08332569.53702584.83022527.546312.2531-15.2932YDR007W97.08332569.53702584.83022527.546312.2531-15.2932

    YER090W3ase11059.7531YER090W3random110447.1111YER090W3naive11059.7531Naive10.95316011030.97287765720.9778616371YDR035W52.91667549.4722553.444475-2.083325-5.5278YDR035W52.91667553.450355-0.533625-2.083325-1.5497YDR035W52.91667537.83477553.83772515.0819-0.92105-16.00295YDR035W52.91667549.472254.6125753.444475-1.6959-5.140375

    YGL026C3ase11079.2593YGL026C3random168556.9591YGL026C3naive11079.2593Iterationave accuracyave costLog ave cost20.41658982970.0394205960.3112710687YDR354W97.08332567.31482586.635829.768510.447525-19.320975YDR354W10063.742486.635836.257613.3642-22.8934YDR354W10074.2592581.3271525.7407518.67285-7.0679YDR354W10074.2592586.635825.7407513.3642-12.37655

    YKL211C3ase11074.321YKL211C3random16277.2222YKL211C3naive11074.321057.35810030.09015137330.14196970860.5990318345YER090W70.9876575.57407577.9321-4.586425-6.94445-2.358025YER090W70.9876563.157977.93217.82975-6.94445-14.7742YER090W70.9876575.57407577.9321-4.586425-6.94445-2.358025YER090W70.9876575.57407577.9321-4.586425-6.94445-2.358025

    YNL316C3ase11069.3827YNL316C3random138478.5185YNL316C3naive11069.3827167.26562510140.12407125270.11446243280.8481660016YGL026C82.96379.42507581.111153.5379251.85185-1.686075YGL026C82.96356.959181.1111526.00391.85185-24.15205YGL026C82.96379.42507579.25933.5379253.70370.165775YGL026C82.96379.42507581.111153.5379251.85185-1.686075

    YBR166C4ase11069.3827YBR166C4random122464.5455YBR166C4naive11069.3827268.550709375391.59106460750.10366336520.13403453560.696108696YKL211C84.24382585.63887581.32715-1.395052.9166754.311725YKL211C84.24382563.742781.3271520.5011252.916675-17.58445YKL211C84.24382585.63887581.32715-1.395052.9166754.311725YKL211C84.24382585.63887581.32715-1.395052.9166754.311725

    YDR007W4ase11074.321YDR007W4random110481.1111YDR007W4naive11074.321373.828478125781.8920946027YNL316C69.15122569.25057558.9969-0.0993510.15432510.253675YNL316C69.15122547.836358.996921.31492510.154325-11.1606YNL316C69.15122569.25057562.731475-0.099356.419756.5191YNL316C69.15122569.25057558.9969-0.0993510.15432510.253675

    YDR035W4ase11053.4503YDR035W4random119920YDR035W4naive11056.4503473.8867531251302.1139433523All0.00712203400.00245050070.9336110841

    YDR354W4ase11074.321YDR354W4random164374.321YDR354W4naive11074.321573.935181251822.260071388ALL640.873475577.3145591.4814563.55897549.392025-14.16695All636.358025458.8302591.48145177.52782544.876575-132.65125All636.358025572.6215590.62782563.73652545.7302-18.006325All643.79015584.258925591.09402559.53122552.696125-6.8351

    YER090W4ase11059.7531YER090W4random124163.1579YER090W4naive11059.7531

    YGL026C4ase11079.2593YGL026C4random124774.8485YGL026C4naive11079.2593

    YKL211C4ase11074.321YKL211C4random121863.7427YKL211C4naive11074.321MEAN7.9448718756.174003125-1.77086875MEAN22.1909781255.609571875-16.58140625MEAN7.9670656255.716275-2.250790625MEAN7.4414031256.587015625-0.8543875

    YNL316C4ase11047.8363YNL316C4random168564.5556YNL316C4naive11047.8363ST DEV13.17958566638.899776414411.8538744459ST DEV12.96058246527.77556664427.419043069ST DEV12.98862052347.95207395310.024391646ST DEV12.27253050989.104671927110.5464561275

    YBR166C1ase26269.3827YBR166C1random248881.3333YBR166C1naive23969.3827TTEST1.70502257681.9621524288-0.4225431297TTEST4.8428042972.0405284882-6.3214755283TTEST1.73492361852.0331887452-0.6350706837TTEST1.71500624332.0463003845-0.2291360008

    YDR007W1ase26295.5556YDR007W1random213393.1481YDR007W1naive23974.321

    YDR035W1ase26243.3333YDR035W1random225753.4503YDR035W1naive23928.8889

    YDR354W1ase26295.5556YDR354W1random244676.4646YDR354W1naive23974.321sqrt(8)2.8284271247confidence fig for 7 degrees freedom1.895

    YER090W1ase26280YER090W1random230972YER090W1naive23959.7531

    YGL026C1ase26286.6667YGL026C1random2962681.1111YGL026C1naive23979.2593Accuracies for all genes at day 0

    YKL211C1ase23988.3333YKL211C1random2962695.5556YKL211C1naive23988.3333

    YNL316C1ase26269.3827YNL316C1random240469.3827YNL316C1naive23969.3827GeneDay (0)CostAccuracy

    YBR166C2ase26269.3827YBR166C2random238878.5185YBR166C2naive23969.3827YBR166C0046.1988

    YDR007W2ase23988.3333YDR007W2random21024963.7427YDR007W2naive23988.3333YDR007W0063.7427

    YDR035W2ase23953.4503YDR035W2random214353.4503YDR035W2naive23953.4503YDR035W0053.4503

    YDR354W2ase26295.5556YDR354W2random21883176.4646YDR354W2naive23974.321YDR354W0063.7427

    YER090W2ase26280YER090W2random2953185.3704YER090W2naive23959.7531YER090W0063.1579

    YGL026C2ase26286.6667YGL026C2random229380.4444YGL026C2naive23979.2593YGL026C0056.9591

    YKL211C2ase26295.5556YKL211C2random2953182.0833YKL211C2naive23974.321YKL211C0063.7427

    YNL316C2ase26269.3827YNL316C2random227664.5455YNL316C2naive23969.3827YNL316C0047.8363

    YBR166C3ase26269.3827YBR166C3random227664.5556YBR166C3naive23969.3827

    YDR007W3ase26295.5556YDR007W3random227676.8687YDR007W3naive23974.321

    YDR035W3ase23955YDR035W3random214317.3333YDR035W3naive23953.4503Changes in accuracy between Day n and Day n-1Changes in accuracy between Day n and Day n-1Changes in accuracy between Day n and Day n-1

    YDR354W3ase26295.5556YDR354W3random272477.2222YDR354W3naive23974.321Run 1AseRun 1RandomRun 1Naive

    YER090W3ase26259.7531YER090W3random2948947.1111YER090W3naive23959.7531

    YGL026C3ase26279.2593YGL026C3random288081.2963YGL026C3naive23979.2593Gene(1 - 0)(2 - 1)(3 - 2)(4 - 3)(5 - 4)Gene(1 - 0)(2 - 1)(3 - 2)(4 - 3)(5 - 4)Gene(1 - 0)(2 - 1)(3 - 2)(4 - 3)(5 - 4)

    YKL211C3ase26274.321YKL211C3random230977.2222YKL211C3naive23974.321YBR166C23.183900-5.82710YBR166C35.13450000YBR166C23.18390000

    YNL316C3ase26269.3827YNL316C3random2978878.5185YNL316C3naive23969.3827YDR007W10.578321.23464.444400YDR007W15.701713.7037000YDR007W10.5783014.012300

    YBR166C4ase26269.3827YBR166C4random2305100YBR166C4naive23969.3827YDR035W-24.561414.44443.333400YDR035W00046.54970YDR035W-24.5614026.111100

    YDR007W4ase26295.5556YDR007W4random229981.1111YDR007W4naive23974.321YDR354W10.578321.23464.444400YDR354W12.72190-29.797900YDR354W10.57830021.23460

    YDR035W4ase23955YDR035W4random2960320YDR035W4naive23953.4503YER090W-3.404820.24694.444400YER090W08.842112.444400YER090W-3.4048036.35800

    YDR354W4ase26295.5556YDR354W4random286174.321YDR354W4naive23974.321YGL026C22.30027.4074000YGL026C24.1520000YGL026C22.3002007.40740

    YER090W4ase26259.7531YER090W4random2966885.3704YER090W4naive23959.7531YKL211C024.5906000YKL211C031.8129000YKL211C024.5906000

    YGL026C4ase26279.2593YGL026C4random232874.8485YGL026C4naive23979.2593YNL316C21.5464012.839500YNL316C21.54640000YNL316C21.54640000

    YKL211C4ase26274.321YKL211C4random2104695.3333YKL211C4naive23974.321

    YNL316C4ase23942.7778YNL316C4random292664.5556YNL316C4naive23942.7778

    YBR166C1ase316669.3827YBR166C1random368781.3333YBR166C1naive37869.3827Run 2AseRun 2RandomRun 2Naive

    YDR007W1ase3251100YDR007W1random337493.1481YDR007W1naive37888.3333

    YDR035W1ase325146.6667YDR035W1random348153.4503YDR035W1naive37855Gene(1 - 0)(2 - 1)(3 - 2)(4 - 3)(5 - 4)Gene(1 - 0)(2 - 1)(3 - 2)(4 - 3)(5 - 4)Gene(1 - 0)(2 - 1)(3 - 2)(4 - 3)(5 - 4)

    YDR354W1ase3251100YDR354W1random367046.6667YDR354W1naive37874.321YBR166C23.1839012.8395240YBR166C32.31970000YBR166C23.18390000

    YER090W1ase325184.4444YER090W1random349884.4444YER090W1naive37896.1111YDR007W024.5906012.33330YDR007W0023.479500YDR007W024.5906000

    YGL026C1ase325186.6667YGL026C1random3983181.1111YGL026C1naive37879.2593YDR035W001.5497-210YDR035W00-6.228100YDR035W00001.5497

    YKL211C1ase37888.3333YKL211C1random31031195.5556YKL211C1naive37888.3333YDR354W10.578321.23464.444424-11.6667YDR354W12.72190-2.02020-27.7777YDR354W10.5783014.012300

    YNL316C1ase316682.2222YNL316C1random3983169.3827YNL316C1naive37869.3827YER090W-3.404820.2469040YER090W4.821917.3906000YER090W-3.4048036.35800

    YBR166C2ase316682.2222YBR166C2random3105078.5185YBR166C2naive37869.3827YGL026C22.30027.4074010.66670YGL026C23.48530000YGL026C22.30020000

    YDR007W2ase37888.3333YDR007W2random31035387.2222YDR007W2naive37888.3333YKL211C10.578321.23464.4444240YKL211C018.3406-7.638900YKL211C10.5783014.012300

    YDR035W2ase37855YDR035W2random382847.2222YDR035W2naive37853.4503YNL316C21.5464012.83956.22220YNL316C16.70920000YNL316C21.546400-14.93830

    YDR354W2ase3251100YDR354W2random31947474.4444YDR354W2naive37888.3333

    YER090W2ase325180YER090W2random3959385.3704YER090W2naive37896.1111

    YGL026C2ase325186.6667YGL026C2random3967880.4444YGL026C2naive37879.2593Run 3AseRun 3RandomRun 3Naive

    YKL211C2ase3251100YKL211C2random31891674.4444YKL211C2naive37888.3333

    YNL316C2ase316682.2222YNL316C2random332864.5455YNL316C2naive37869.3827Gene(1 - 0)(2 - 1)(3 - 2)(4 - 3)(5 - 4)Gene(1 - 0)(2 - 1)(3 - 2)(4 - 3)(5 - 4)Gene(1 - 0)(2 - 1)(3 - 2)(4 - 3)(5 - 4)

    YBR166C3ase316682.2222YBR166C3random352364.5556YBR166C3naive37869.3827YBR166C23.1839012.839500YBR166C18.35680000YBR166C23.18390000

    YDR007W3ase3251100YDR007W3random332851.1111YDR007W3naive37888.3333YDR007W10.578321.23464.444400YDR007W13.1260-25.757600YDR007W10.5783014.012300

    YDR035W3ase37855YDR035W3random3957017.3333YDR035W3naive37853.4503YDR035W01.5497000YDR035W0-36.117000YDR035W0001.54970

    YDR354W3ase3251100YDR354W3random378675.9259YDR354W3naive37874.321YDR354W10.578321.23464.444400YDR354W13.47950-1.296300YDR354W10.57830000

    YER090W3ase316659.7531YER090W3random31013247.1111YER090W3naive37859.7531YER090W-3.40480000YER090W-16.04680000YER090W-3.40480000

    YGL026C3ase316679.2593YGL026C3random394281.2963YGL026C3naive37879.2593YGL026C22.30020000YGL026C024.3372000YGL026C22.30020000

    YKL211C3ase316674.321YKL211C3random331977.2222YKL211C3naive37874.321YKL211C10.57830000YKL211C13.47950000YKL211C10.57830000

    YNL316C3ase316669.3827YNL316C3random31921578.5185YNL316C3naive37869.3827YNL316C21.54640000YNL316C30.68220000YNL316C21.54640000

    YBR166C4ase316682.2222YBR166C4random3344100YBR166C4naive37869.3827

    YDR007W4ase3251100YDR007W4random398446.6667YDR007W4naive37874.321

    YDR035W4ase37855YDR035W4random31028833.3333YDR035W4naive37853.4503Run 4AseRun 4RandomRun 4Naive

    YDR354W4ase3251100YDR354W4random31079100YDR354W4naive37888.3333

    YER090W4ase316659.7531YER090W4random31035385.3704YER090W4naive37859.7531Gene(1 - 0)(2 - 1)(3 - 2)(4 - 3)(5 - 4)Gene(1 - 0)(2 - 1)(3 - 2)(4 - 3)(5 - 4)Gene(1 - 0)(2 - 1)(3 - 2)(4 - 3)(5 - 4)

    YGL026C4ase316679.2593YGL026C4random354674.8485YGL026C4naive37879.2593YBR166C23.1839012.839517.77780YBR166C18.346735.4545000YBR166C23.183900-14.93830

    YKL211C4ase316674.321YKL211C4random31061095.3333YKL211C4naive37874.321YDR007W10.578321.23464.444400YDR007W17.36840-34.444400YDR007W10.57830000

    YNL316C4ase37842.7778YNL316C4random3161164.5556YNL316C4naive37842.7778YDR035W01.5497000YDR035W-33.4503013.333300YDR035W3-301.54970

    YBR166C1ase427063.5556YBR166C1random41025181.3333YBR166C1naive413069.3827YDR354W10.578321.23464.444400YDR354W10.5783025.67900YDR354W10.5783014.012300

    YDR007W1ase4280100YDR007W1random4105993.1481YDR007W1naive413088.3333YER090W-3.40480000YER090W022.2125000YER090W-3.40480000

    YDR035W1ase428046.6667YDR035W1random4728100YDR035W1naive413055YGL026C22.30020000YGL026C17.88940000YGL026C22.30020000

    YDR354W1ase4280100YDR354W1random467046.6667YDR354W1naive413095.5556YKL211C10.57830000YKL211C031.5906000YKL211C10.57830000

    YER090W1ase428084.4444YER090W1random4116084.4444YER090W1naive413096.1111YNL316C0-5.0585000YNL316C16.71930000YNL316C0-5.0585000

    YGL026C1ase445086.6667YGL026C1random4986081.1111YGL026C1naive413086.6667

    YKL211C1ase413088.3333YKL211C1random41036395.5556YKL211C1naive413088.3333

    YNL316C1ase421882.2222YNL316C1random41047469.3827YNL316C1naive413069.3827

    YBR166C2ase4218100YBR166C2random4173578.5185YBR166C2naive413069.3827

    YDR007W2ase413088.3333YDR007W2random41103887.2222YDR007W2naive413088.3333Average change for all 4 runsAverage change for all 4 runsAverage change for all 4 runs

    YDR035W2ase413055YDR035W2random41025547.2222YDR035W2naive413053.4503AverageAseAverageRandomAverageNaive

    YDR354W2ase4280100YDR354W2random41971574.4444YDR354W2naive413088.3333

    YER090W2ase445080YER090W2random4962285.3704YER090W2naive413096.1111Gene(1 - 0)(2 - 1)(3 - 2)(4 - 3)(5 - 4)Gene(1 - 0)(2 - 1)(3 - 2)(4 - 3)(5 - 4)Gene(1 - 0)(2 - 1)(3 - 2)(4 - 3)(5 - 4)

    YGL026C2ase445086.6667YGL026C2random41006280.4444YGL026C2naive413079.2593YBR166C23.183909.6296258.9876750YBR166C26.0394258.863625000YBR166C23.183900-3.7345750

    YKL211C2ase4280100YKL211C2random41960174.4444YKL211C2naive413088.3333YDR007W7.93372522.07363.33333.0833250YDR007W11.5490253.425925-9.18062500YDR007W7.9337256.147657.0061500

    YNL316C2ase421882.2222YNL316C2random499064.5455YNL316C2naive413054.4444YDR035W-6.140354.385951.220775-5.250YDR035W-8.362575-9.029251.776311.6374250YDR035W-5.39035-0.756.5277750.774850.387425

    YBR166C3ase421882.2222YBR166C3random4995064.5556YBR166C3naive413069.3827YDR354W10.578321.23464.44446-2.916675YDR354W12.37540-1.858850-6.944425YDR354W10.578307.006155.308650

    YDR007W3ase4280100YDR007W3random432851.1111YDR007W3naive413088.3333YER090W-3.404810.123451.111110YER090W-2.80622512.11133.111100YER090W-3.4048018.17900

    YDR035W3ase413055YDR035W3random4976917.3333YDR035W3naive413055YGL026C22.30023.703702.6666750YGL026C16.3816756.0843000YGL026C22.3002001.851850

    YDR354W3ase4280100YDR354W3random483875.9259YDR354W3naive413074.321YKL211C7.93372511.45631.111160YKL211C3.36987520.436025-1.90972500YKL211C7.9337256.147653.50307500

    YER090W3ase427059.7531YER090W3random41037347.1111YER090W3naive413059.7531YNL316C16.1598-1.2646256.419751.555550YNL316C21.4142750000YNL316C16.1598-1.2646250-3.7345750

    YGL026C3ase427079.2593YGL026C3random4162781.2963YGL026C3naive413079.2593

    YKL211C3ase427074.321YKL211C3random437177.2222YKL211C3naive413074.321Average for all genesAverage for all genesAverage for all genes

    YNL316C3ase427069.3827YNL316C3random41945678.5185YNL316C3naive413069.3827All9.81806258.9641218753.408756253.005403125-0.364584375All9.9951093755.236490625-1.0077251.454678125-0.868053125All9.91181251.2850843755.277768750.0582750.048428125

    YBR166C4ase4218100YBR166C4random4344100YBR166C4naive413054.4444

    YDR007W4ase4280100YDR007W4random498446.6667YDR007W4naive413074.321Comparison of change in accuracy between techniques (AccChangeTech1 - AccChangeTech2)Comparison of change in accuracy between techniques (AccChangeTech1 - AccChangeTech2)Comparison of change in accuracy between techniques (AccChangeTech1 - AccChangeTech2)

    YDR035W4ase413055YDR035W4random41028833.3333YDR035W4naive413055Ase-RandomAse -NaiveRandom -Naive

    YDR354W4ase4280100YDR354W4random41463100YDR354W4naive413088.3333

    YER090W4ase427059.7531YER090W4random41991785.3704YER090W4naive413059.7531Gene(1 - 0)(2 - 1)(3 - 2)(4 - 3)(5 - 4)Gene(1 - 0)(2 - 1)(3 - 2)(4 - 3)(5 - 4)Gene(1 - 0)(2 - 1)(3 - 2)(4 - 3)(5 - 4)

    YGL026C4ase427079.2593YGL026C4random4117974.8485YGL026C4naive413079.2593YBR166C-2.855525-8.8636259.6296258.9876750YBR166C009.62962512.722250YBR166C2.8555258.86362503.7345750

    YKL211C4ase427074.321YKL211C4random41071495.3333YKL211C4naive413074.321YDR007W-3.615318.64767512.5139253.0833250YDR007W015.92595-3.672853.0833250YDR007W3.6153-2.721725-16.18677500

    YNL316C4ase413042.7778YNL316C4random41103864.5556YNL316C4naive413042.7778YDR035W2.22222513.4152-0.555525-16.8874250YDR035W-0.755.13595-5.307-6.02485-0.387425YDR035W-2.972225-8.27925-4.75147510.862575-0.387425

    YBR166C1ase545963.5556YBR166C1random51030381.3333YBR166C1naive518269.3827YDR354W-1.797121.23466.3032564.02775YDR354W021.2346-2.561750.69135-2.916675YDR354W1.79710-8.865-5.30865-6.944425

    YDR007W1ase5319100YDR007W1random51062393.1481YDR007W1naive518288.3333YER090W-0.598575-1.98785-210YER090W010.12345-17.067910YER090W0.59857512.1113-15.067900

    YDR035W1ase531946.6667YDR035W1random5728100YDR035W1naive518255YGL026C5.918525-2.380602.6666750YGL026C03.703700.8148250YGL026C-5.9185256.08430-1.851850

    YDR354W1ase5319100YDR354W1random567046.6667YDR354W1naive518295.5556YKL211C4.56385-8.9797253.02082560YKL211C05.30865-2.39197560YKL211C-4.5638514.288375-5.412800

    YER090W1ase531984.4444YER090W1random51054584.4444YER090W1naive518296.1111YNL316C-5.254475-1.2646256.419751.555550YNL316C006.419755.2901250YNL316C5.2544751.26462503.7345750

    YGL026C1ase547986.6667YGL026C1random51004981.1111YGL026C1naive518286.6667

    YKL211C1ase518288.3333YKL211C1random51042595.5556YKL211C1naive518288.3333Sum-1.41637529.8210535.3318512.40584.02775Sum-0.7561.4323-14.952123.577025-3.3041Sum0.66637531.61125-50.2839511.171225-7.33185

    YNL316C1ase527082.2222YNL316C1random51113669.3827YNL316C1naive518269.3827

    YBR166C2ase5218100YBR166C2random5237878.5185YBR166C2naive518269.3827Mean-0.1770468753.727631254.416481251.5507250.50346875Mean-0.093757.6790375-1.86901252.947128125-0.4130125Mean0.0832968753.95140625-6.285493751.396403125-0.91648125

    YDR007W2ase518288.3333YDR007W2random51172387.2222YDR007W2naive518288.3333

    YDR035W2ase518255YDR035W2random51035947.2222YDR035W2naive518255StDev4.017230198312.17648273645.17775028127.91709926361.424024669StDev0.26516504297.59289673668.02895296415.40243555231.0206760074StDev4.08932520767.73736717476.58414747034.80923540462.4394271476

    YDR354W2ase531988.3333YDR354W2random51979646.6667YDR354W2naive518288.3333

    YER090W2ase547980YER090W2random5986985.3704YER090W2naive518296.1111sqrt(8)2.8284271247Ttest-0.12465409220.86587675332.41257200230.55400500951Ttest-12.8605153883-0.65841283111.5429591058-1.1445118229Ttest0.05761320721.4444531798-2.70013104890.8212582965-1.0626267029

    YGL026C2ase547986.6667YGL026C2random51011480.4444YGL026C2naive518279.2593

    YKL211C2ase5319100YKL211C2random52900574.4444YKL211C2naive518288.3333

    YNL316C2ase527082.2222YNL316C2random5167564.5455YNL316C2naive518254.4444confidence fig for 7 degrees freedom1.895confidence fig for 7 degrees freedom1.895confidence fig for 7 degrees freedom1.895

    YBR166C3ase527082.2222YBR166C3random51005464.5556YBR166C3naive518269.3827

    YDR007W3ase5319100YDR007W3random532851.1111YDR007W3naive518288.3333

    YDR035W3ase518255YDR035W3random5983117.3333YDR035W3naive518255

    YDR354W3ase5319100YDR354W3random5103775.9259YDR354W3naive518274.321

    YER090W3ase537459.7531YER090W3random51042547.1111YER090W3naive518259.7531

    YGL026C3ase537479.2593YGL026C3random51101281.2963YGL026C3naive518279.2593

    YKL211C3ase537474.321YKL211C3random556077.2222YKL211C3naive518274.321

    YNL316C3ase537469.3827YNL316C3random51948578.5185YNL316C3naive518269.3827

    YBR166C4ase5218100YBR166C4random5344100YBR166C4naive518254.4444

    YDR007W4ase5319100YDR007W4random598446.6667YDR007W4naive518274.321

    YDR035W4ase518255YDR035W4random51028833.3333YDR035W4naive518255

    YDR354W4ase5319100YDR354W4random51710100YDR354W4naive518288.3333

    YER090W4ase537459.7531YER090W4random52057985.3704YER090W4naive518259.7531

    YGL026C4ase537479.2593YGL026C4random51060674.8485YGL026C4naive518279.2593

    YKL211C4ase537474.321YKL211C4random51077695.3333YKL211C4naive518274.321

    YNL316C4ase518242.7778YNL316C4random51128564.5556YNL316C4naive518242.7778

  • Accuracy v MoneyGiven a spend of 102.26, ASE 79.5%, Nave 73.9%, Random 57.4%. ASE was significantly more accurate than either Nave (p < 0.05) or Random (p < 0.001).

    RS Accuracy

    57.358157.358157.3581

    67.17187567.34892187567.265625

    76.13599687572.585412568.550709375

    79.54475312571.577687573.828478125

    80.4737687573.03236562573.886753125

    80.10918437572.164312573.93518125

    ase

    random

    naive

    Iterations

    Classification Accuracy (%)

    RS cost vs accuracy

    57.358157.358157.3581

    67.17187567.34892187567.265625

    76.13599687572.585412568.550709375

    79.54475312571.577687573.828478125

    80.4737687573.03236562573.886753125

    80.10918437572.164312573.93518125

    ase

    random

    naive

    Log 10 Cost ()

    Classification Accuracy (%)

    results

    AseRandomNaiveday 1

    GeneErrorLoopTechniqueIterationCostAccuracyGeneErrorLoopTechniqueIterationCostAccuracyGeneErrorLoopTechniqueIterationCostAccuracy

    AseTests (individual Gene)Ave Accuracy

    YBR166C1ase11069.3827YBR166C1random124781.3333YBR166C1naive11069.3827Iterationave accuracyave costLog ave costAseRandomNaive

    YDR007W1ase11074.321YDR007W1random18179.4444YDR007W1naive11074.321057.358100Gene

    YDR035W1ase11028.8889YDR035W1random16253.4503YDR035W1naive11028.8889167.171875101T-TestsCostYBR166C69.382772.23822569.3827

    YDR354W1ase11074.321YDR354W1random124776.4646YDR354W1naive11074.321276.13599687557.68751.7610817184YDR007W71.67642575.29172571.676425

    YER090W1ase11059.7531YER090W1random110463.1579YER090W1naive11059.7531379.544753125184.031252.264891576Iterationase|randomase|naverandom|naveYDR035W47.3099545.08772548.05995

    YGL026C1ase11079.2593YGL026C1random16281.1111YGL026C1naive11079.2593480.47376875255.6252.4076033254YDR354W74.32176.118174.321

    YKL211C1ase11063.7427YKL211C1random124163.7427YKL211C1naive11063.7427580.109184375313.81252.496670238710.05645075200.056450752YER090W59.753160.35167559.7531

    YNL316C1ase11069.3827YNL316C1random120569.3827YNL316C1naive11069.382720.00038079300.0003592261YGL026C79.259373.34077579.2593

    YBR166C2ase11069.3827YBR166C2random118978.5185YBR166C2naive11069.382730.00002549460.00000000020.0000193599YKL211C71.67642567.11257571.676425

    YDR007W2ase11063.7427YDR007W2random182263.7427YDR007W2naive11063.742740.00000066860.00000000180.0000004905YNL316C63.996169.25057563.9961

    YDR035W2ase11053.4503YDR035W2random16253.4503YDR035W2naive11053.4503Random50.00000004080.0000000030.0000000303

    YDR354W2ase11074.321YDR354W2random1942776.4646YDR354W2naive11074.321Iterationave accuracyave costLog ave cost

    YER090W2ase11059.7531YER090W2random1942767.9798YER090W2naive11059.7531057.358100All0.00000000000.00000000020.0000000000

    YGL026C2ase11079.2593YGL026C2random124180.4444YGL026C2naive11079.2593167.348921875805.468752.9060486956day 5AccuracyAccuracy/Cost compares ase at day 3, random at day 0 and naive at day 5Accuracy day 3Accuracy day 4

    YKL211C2ase11074.321YKL211C2random110463.7427YKL211C2naive11074.321272.58541253607.8753.5572514824

    YNL316C2ase11069.3827YNL316C2random122464.5455YNL316C2naive11069.3827371.57768755641.656253.7514066208Tests (individual Gene)

    YBR166C3ase11069.3827YBR166C3random18164.5556YBR166C3naive11069.3827473.0323656257381.93753.8681703639T-TestsAccuracyAseRandomNaiveAse/RandomAse/NaiveRandom/NaiveAse (3)Random (0)naive (5)Ase/RandomAse/NaiveRandom/NaiveAse (3)Random (0)naive (5)Ase/RandomAse/NaiveRandom/NaiveAse (3)Random (0)naive (5)Ase/RandomAse/NaiveRandom/Naive

    YDR007W3ase11074.321YDR007W3random12976.8687YDR007W3naive11074.321572.16431259021.93753.9552998142Gene

    YDR035W3ase11053.4503YDR035W3random16253.4503YDR035W3naive11053.4503Iterationase|randomase|naverandom|naveYBR166C86.4444581.1018565.6481255.342620.79632515.453725YBR166C79.01232546.198865.64812532.81352513.3642-19.449325YBR166C79.01232581.1018569.3827-2.0895259.62962511.71915YBR166C86.4444581.1018565.6481255.342620.79632515.453725

    YDR354W3ase11074.321YDR354W3random16277.2222YDR354W3naive11074.321YDR007W97.08332569.53702584.83022527.546312.2531-15.2932YDR007W97.08332563.742784.83022533.34062512.2531-21.087525YDR007W97.08332569.53702584.83022527.546312.2531-15.2932YDR007W97.08332569.53702584.83022527.546312.2531-15.2932

    YER090W3ase11059.7531YER090W3random110447.1111YER090W3naive11059.7531Naive10.95316011030.97287765720.9778616371YDR035W52.91667549.4722553.444475-2.083325-5.5278YDR035W52.91667553.450355-0.533625-2.083325-1.5497YDR035W52.91667537.83477553.83772515.0819-0.92105-16.00295YDR035W52.91667549.472254.6125753.444475-1.6959-5.140375

    YGL026C3ase11079.2593YGL026C3random168556.9591YGL026C3naive11079.2593Iterationave accuracyave costLog ave cost20.41658982970.0394205960.3112710687YDR354W97.08332567.31482586.635829.768510.447525-19.320975YDR354W10063.742486.635836.257613.3642-22.8934YDR354W10074.2592581.3271525.7407518.67285-7.0679YDR354W10074.2592586.635825.7407513.3642-12.37655

    YKL211C3ase11074.321YKL211C3random16277.2222YKL211C3naive11074.321057.35810030.09015137330.14196970860.5990318345YER090W70.9876575.57407577.9321-4.586425-6.94445-2.358025YER090W70.9876563.157977.93217.82975-6.94445-14.7742YER090W70.9876575.57407577.9321-4.586425-6.94445-2.358025YER090W70.9876575.57407577.9321-4.586425-6.94445-2.358025

    YNL316C3ase11069.3827YNL316C3random138478.5185YNL316C3naive11069.3827167.26562510140.12407125270.11446243280.8481660016YGL026C82.96379.42507581.111153.5379251.85185-1.686075YGL026C82.96356.959181.1111526.00391.85185-24.15205YGL026C82.96379.42507579.25933.5379253.70370.165775YGL026C82.96379.42507581.111153.5379251.85185-1.686075

    YBR166C4ase11069.3827YBR166C4random122464.5455YBR166C4naive11069.3827268.550709375391.59106460750.10366336520.13403453560.696108696YKL211C84.24382585.63887581.32715-1.395052.9166754.311725YKL211C84.24382563.742781.3271520.5011252.916675-17.58445YKL211C84.24382585.63887581.32715-1.395052.9166754.311725YKL211C84.24382585.63887581.32715-1.395052.9166754.311725

    YDR007W4ase11074.321YDR007W4random110481.1111YDR007W4naive11074.321373.828478125781.8920946027YNL316C69.15122569.25057558.9969-0.0993510.15432510.253675YNL316C69.15122547.836358.996921.31492510.154325-11.1606YNL316C69.15122569.25057562.731475-0.099356.419756.5191YNL316C69.15122569.25057558.9969-0.0993510.15432510.253675

    YDR035W4ase11053.4503YDR035W4random119920YDR035W4naive11056.4503473.8867531251302.1139433523All0.00712203400.00245050070.9336110841

    YDR354W4ase11074.321YDR354W4random164374.321YDR354W4naive11074.321573.935181251822.260071388ALL640.873475577.3145591.4814563.55897549.392025-14.16695All636.358025458.8302591.48145177.52782544.876575-132.65125All636.358025572.6215590.62782563.73652545.7302-18.006325All643.79015584.258925591.09402559.53122552.696125-6.8351

    YER090W4ase11059.7531YER090W4random124163.1579YER090W4naive11059.7531

    YGL026C4ase11079.2593YGL026C4random124774.8485YGL026C4naive11079.2593

    YKL211C4ase11074.321YKL211C4random121863.7427YKL211C4naive11074.321MEAN7.9448718756.174003125-1.77086875MEAN22.1909781255.609571875-16.58140625MEAN7.9670656255.716275-2.250790625MEAN7.4414031256.587015625-0.8543875

    YNL316C4ase11047.8363YNL316C4random168564.5556YNL316C4naive11047.8363ST DEV13.17958566638.899776414411.8538744459ST DEV12.96058246527.77556664427.419043069ST DEV12.98862052347.95207395310.024391646ST DEV12.27253050989.104671927110.5464561275

    YBR166C1ase26269.3827YBR166C1random248881.3333YBR166C1naive23969.3827TTEST1.70502257681.9621524288-0.4225431297TTEST4.8428042972.0405284882-6.3214755283TTEST1.73492361852.0331887452-0.6350706837TTEST1.71500624332.0463003845-0.2291360008

    YDR007W1ase26295.5556YDR007W1random213393.1481YDR007W1naive23974.321

    YDR035W1ase26243.3333YDR035W1random225753.4503YDR035W1naive23928.8889

    YDR354W1ase26295.5556YDR354W1random244676.4646YDR354W1naive23974.321sqrt(8)2.8284271247confidence fig for 7 degrees freedom1.895

    YER090W1ase26280YER090W1random230972YER090W1naive23959.7531

    YGL026C1ase26286.6667YGL026C1random2962681.1111YGL026C1naive23979.2593Accuracies for all genes at day 0

    YKL211C1ase23988.3333YKL211C1random2962695.5556YKL211C1naive23988.3333

    YNL316C1ase26269.3827YNL316C1random240469.3827YNL316C1naive23969.3827GeneDay (0)CostAccuracy

    YBR166C2ase26269.3827YBR166C2random238878.5185YBR166C2naive23969.3827YBR166C0046.1988

    YDR007W2ase23988.3333YDR007W2random21024963.7427YDR007W2naive23988.3333YDR007W0063.7427

    YDR035W2ase23953.4503YDR035W2random214353.4503YDR035W2naive23953.4503YDR035W0053.4503

    YDR354W2ase26295.5556YDR354W2random21883176.4646YDR354W2naive23974.321YDR354W0063.7427

    YER090W2ase26280YER090W2random2953185.3704YER090W2naive23959.7531YER090W0063.1579

    YGL026C2ase26286.6667YGL026C2random229380.4444YGL026C2naive23979.2593YGL026C0056.9591

    YKL211C2ase26295.5556YKL211C2random2953182.0833YKL211C2naive23974.321YKL211C0063.7427

    YNL316C2ase26269.3827YNL316C2random227664.5455YNL316C2naive23969.3827YNL316C0047.8363

    YBR166C3ase26269.3827YBR166C3random227664.5556YBR166C3naive23969.3827

    YDR007W3ase26295.5556YDR007W3random227676.8687YDR007W3naive23974.321

    YDR035W3ase23955YDR035W3random214317.3333YDR035W3naive23953.4503Changes in accuracy between Day n and Day n-1Changes in accuracy between Day n and Day n-1Changes in accuracy between Day n and Day n-1

    YDR354W3ase26295.5556YDR354W3random272477.2222YDR354W3naive23974.321Run 1AseRun 1RandomRun 1Naive

    YER090W3ase26259.7531YER090W3random2948947.1111YER090W3naive23959.7531

    YGL026C3ase26279.2593YGL026C3random288081.2963YGL026C3naive23979.2593Gene(1 - 0)(2 - 1)(3 - 2)(4 - 3)(5 - 4)Gene(1 - 0)(2 - 1)(3 - 2)(4 - 3)(5 - 4)Gene(1 - 0)(2 - 1)(3 - 2)(4 - 3)(5 - 4)

    YKL211C3ase26274.321YKL211C3random230977.2222YKL211C3naive23974.321YBR166C23.183900-5.82710YBR166C35.13450000YBR166C23.18390000

    YNL316C3ase26269.3827YNL316C3random2978878.5185YNL316C3naive23969.3827YDR007W10.578321.23464.444400YDR007W15.701713.7037000YDR007W10.5783014.012300

    YBR166C4ase26269.3827YBR166C4random2305100YBR166C4naive23969.3827YDR035W-24.561414.44443.333400YDR035W00046.54970YDR035W-24.5614026.111100

    YDR007W4ase26295.5556YDR007W4random229981.1111YDR007W4naive23974.321YDR354W10.578321.23464.444400YDR354W12.72190-29.797900YDR354W10.57830021.23460

    YDR035W4ase23955YDR035W4random2960320YDR035W4naive23953.4503YER090W-3.404820.24694.444400YER090W08.842112.444400YER090W-3.4048036.35800

    YDR354W4ase26295.5556YDR354W4random286174.321YDR354W4naive23974.321YGL026C22.30027.4074000YGL026C24.1520000YGL026C22.3002007.40740

    YER090W4ase26259.7531YER090W4random2966885.3704YER090W4naive23959.7531YKL211C024.5906000YKL211C031.8129000YKL211C024.5906000

    YGL026C4ase26279.2593YGL026C4random232874.8485YGL026C4naive23979.2593YNL316C21.5464012.839500YNL316C21.54640000YNL316C21.54640000

    YKL211C4ase26274.321YKL211C4random2104695.3333YKL211C4naive23974.321

    YNL316C4ase23942.7778YNL316C4random292664.5556YNL316C4naive23942.7778

    YBR166C1ase316669.3827YBR166C1random368781.3333YBR166C1naive37869.3827Run 2AseRun 2RandomRun 2Naive

    YDR007W1ase3251100YDR007W1random337493.1481YDR007W1naive37888.3333

    YDR035W1ase325146.6667YDR035W1random348153.4503YDR035W1naive37855Gene(1 - 0)(2 - 1)(3 - 2)(4 - 3)(5 - 4)Gene(1 - 0)(2 - 1)(3 - 2)(4 - 3)(5 - 4)Gene(1 - 0)(2 - 1)(3 - 2)(4 - 3)(5 - 4)

    YDR354W1ase3251100YDR354W1random367046.6667YDR354W1naive37874.321YBR166C23.1839012.8395240YBR166C32.31970000YBR166C23.18390000

    YER090W1ase325184.4444YER090W1random349884.4444YER090W1naive37896.1111YDR007W024.5906012.33330YDR007W0023.479500YDR007W024.5906000

    YGL026C1ase325186.6667YGL026C1random3983181.1111YGL026C1naive37879.2593YDR035W001.5497-210YDR035W00-6.228100YDR035W00001.5497

    YKL211C1ase37888.3333YKL211C1random31031195.5556YKL211C1naive37888.3333YDR354W10.578321.23464.444424-11.6667YDR354W12.72190-2.02020-27.7777YDR354W10.5783014.012300

    YNL316C1ase316682.2222YNL316C1random3983169.3827YNL316C1naive37869.3827YER090W-3.404820.2469040YER090W4.821917.3906000YER090W-3.4048036.35800

    YBR166C2ase316682.2222YBR166C2random3105078.5185YBR166C2naive37869.3827YGL026C22.30027.4074010.66670YGL026C23.48530000YGL026C22.30020000

    YDR007W2ase37888.3333YDR007W2random31035387.2222YDR007W2naive37888.3333YKL211C10.578321.23464.4444240YKL211C018.3406-7.638900YKL211C10.5783014.012300

    YDR035W2ase37855YDR035W2random382847.2222YDR035W2naive37853.4503YNL316C21.5464012.83956.22220YNL316C16.70920000YNL316C21.546400-14.93830

    YDR354W2ase3251100YDR354W2random31947474.4444YDR354W2naive37888.3333

    YER090W2ase325180YER090W2random3959385.3704YER090W2naive37896.1111

    YGL026C2ase325186.6667YGL026C2random3967880.4444YGL026C2naive37879.2593Run 3AseRun 3RandomRun 3Naive

    YKL211C2ase3251100YKL211C2random31891674.4444YKL211C2naive37888.3333

    YNL316C2ase316682.2222YNL316C2random332864.5455YNL316C2naive37869.3827Gene(1 - 0)(2 - 1)(3 - 2)(4 - 3)(5 - 4)Gene(1 - 0)(2 - 1)(3 - 2)(4 - 3)(5 - 4)Gene(1 - 0)(2 - 1)(3 - 2)(4 - 3)(5 - 4)

    YBR166C3ase316682.2222YBR166C3random352364.5556YBR166C3naive37869.3827YBR166C23.1839012.839500YBR166C18.35680000YBR166C23.18390000

    YDR007W3ase3251100YDR007W3random332851.1111YDR007W3naive37888.3333YDR007W10.578321.23464.444400YDR007W13.1260-25.757600YDR007W10.5783014.012300

    YDR035W3ase37855YDR035W3random3957017.3333YDR035W3naive37853.4503YDR035W01.5497000YDR035W0-36.117000YDR035W0001.54970

    YDR354W3ase3251100YDR354W3random378675.9259YDR354W3naive37874.321YDR354W10.578321.23464.444400YDR354W13.47950-1.296300YDR354W10.57830000

    YER090W3ase316659.7531YER090W3random31013247.1111YER090W3naive37859.7531YER090W-3.40480000YER090W-16.04680000YER090W-3.40480000

    YGL026C3ase316679.2593YGL026C3random394281.2963YGL026C3naive37879.2593YGL026C22.30020000YGL026C024.3372000YGL026C22.30020000

    YKL211C3ase316674.321YKL211C3random331977.2222YKL211C3naive37874.321YKL211C10.57830000YKL211C13.47950000YKL211C10.57830000

    YNL316C3ase316669.3827YNL316C3random31921578.5185YNL316C3naive37869.3827YNL316C21.54640000YNL316C30.68220000YNL316C21.54640000

    YBR166C4ase316682.2222YBR166C4random3344100YBR166C4naive37869.3827

    YDR007W4ase3251100YDR007W4random398446.6667YDR007W4naive37874.321

    YDR035W4ase37855YDR035W4random31028833.3333YDR035W4naive37853.4503Run 4AseRun 4RandomRun 4Naive

    YDR354W4ase3251100YDR354W4random31079100YDR354W4naive37888.3333

    YER090W4ase316659.7531YER090W4random31035385.3704YER090W4naive37859.7531Gene(1 - 0)(2 - 1)(3 - 2)(4 - 3)(5 - 4)Gene(1 - 0)(2 - 1)(3 - 2)(4 - 3)(5 - 4)Gene(1 - 0)(2 - 1)(3 - 2)(4 - 3)(5 - 4)

    YGL026C4ase316679.2593YGL026C4random354674.8485YGL026C4naive37879.2593YBR166C23.1839012.839517.77780YBR166C18.346735.4545000YBR166C23.183900-14.93830

    YKL211C4ase316674.321YKL211C4random31061095.3333YKL211C4naive37874.321YDR007W10.578321.23464.444400YDR007W17.36840-34.444400YDR007W10.57830000

    YNL316C4ase37842.7778YNL316C4random3161164.5556YNL316C4naive37842.7778YDR035W01.5497000YDR035W-33.4503013.333300YDR035W3-301.54970

    YBR166C1ase427063.5556YBR166C1random41025181.3333YBR166C1naive413069.3827YDR354W10.578321.23464.444400YDR354W10.5783025.67900YDR354W10.5783014.012300

    YDR007W1ase4280100YDR007W1random4105993.1481YDR007W1naive413088.3333YER090W-3.40480000YER090W022.2125000YER090W-3.40480000

    YDR035W1ase428046.6667YDR035W1random4728100YDR035W1naive413055YGL026C22.30020000YGL026C17.88940000YGL026C22.30020000

    YDR354W1ase4280100YDR354W1random467046.6667YDR354W1naive413095.5556YKL211C10.57830000YKL211C031.5906000YKL211C10.57830000

    YER090W1ase428084.4444YER090W1random4116084.4444YER090W1naive413096.1111YNL316C0-5.0585000YNL316C16.71930000YNL316C0-5.0585000

    YGL026C1ase445086.6667YGL026C1random4986081.1111YGL026C1naive413086.6667

    YKL211C1ase413088.3333YKL211C1random41036395.5556YKL211C1naive413088.3333

    YNL316C1ase421882.2222YNL316C1random41047469.3827YNL316C1naive413069.3827

    YBR166C2ase4218100YBR166C2random4173578.5185YBR166C2naive413069.3827

    YDR007W2ase413088.3333YDR007W2random41103887.2222YDR007W2naive413088.3333Average change for all 4 runsAverage change for all 4 runsAverage change for all 4 runs

    YDR035W2ase413055YDR035W2random41025547.2222YDR035W2naive413053.4503AverageAseAverageRandomAverageNaive

    YDR354W2ase4280100YDR354W2random41971574.4444YDR354W2naive413088.3333

    YER090W2ase445080YER090W2random4962285.3704YER090W2naive413096.1111Gene(1 - 0)(2 - 1)(3 - 2)(4 - 3)(5 - 4)Gene(1 - 0)(2 - 1)(3 - 2)(4 - 3)(5 - 4)Gene(1 - 0)(2 - 1)(3 - 2)(4 - 3)(5 - 4)

    YGL026C2ase445086.6667YGL026C2random41006280.4444YGL026C2naive413079.2593YBR166C23.183909.6296258.9876750YBR166C26.0394258.863625000YBR166C23.183900-3.7345750

    YKL211C2ase4280100YKL211C2random41960174.4444YKL211C2naive413088.3333YDR007W7.93372522.07363.33333.0833250YDR007W11.5490253.425925-9.18062500YDR007W7.9337256.147657.0061500

    YNL316C2ase421882.2222YNL316C2random499064.5455YNL316C2naive413054.4444YDR035W-6.140354.385951.220775-5.250YDR035W-8.362575-9.029251.776311.6374250YDR035W-5.39035-0.756.5277750.774850.387425

    YBR166C3ase421882.2222YBR166C3random4995064.5556YBR166C3naive413069.3827YDR354W10.578321.23464.44446-2.916675YDR354W12.37540-1.858850-6.944425YDR354W10.578307.006155.308650

    YDR007W3ase4280100YDR007W3random432851.1111YDR007W3naive413088.3333YER090W-3.404810.123451.111110YER090W-2.80622512.11133.111100YER090W-3.4048018.17900

    YDR035W3ase413055YDR035W3random4976917.3333YDR035W3naive413055YGL026C22.30023.703702.6666750YGL026C16.3816756.0843000YGL026C22.3002001.851850

    YDR354W3ase4280100YDR354W3random483875.9259YDR354W3naive413074.321YKL211C7.93372511.45631.111160YKL211C3.36987520.436025-1.90972500YKL211C7.9337256.147653.50307500

    YER090W3ase427059.7531YER090W3random41037347.1111YER090W3naive413059.7531YNL316C16.1598-1.2646256.419751.555550YNL316C21.4142750000YNL316C16.1598-1.2646250-3.7345750

    YGL026C3ase427079.2593YGL026C3random4162781.2963YGL026C3naive413079.2593

    YKL211C3ase427074.321YKL211C3random437177.2222YKL211C3naive413074.321Average for all genesAverage for all genesAverage for all genes

    YNL316C3ase427069.3827YNL316C3random41945678.5185YNL316C3naive413069.3827All9.81806258.9641218753.408756253.005403125-0.364584375All9.9951093755.236490625-1.0077251.454678125-0.868053125All9.91181251.2850843755.277768750.0582750.048428125

    YBR166C4ase4218100YBR166C4random4344100YBR166C4naive413054.4444

    YDR007W4ase4280100YDR007W4random498446.6667YDR007W4naive413074.321Comparison of change in accuracy between techniques (AccChangeTech1 - AccChangeTech2)Comparison of change in accuracy between techniques (AccChangeTech1 - AccChangeTech2)Comparison of change in accuracy between techniques (AccChangeTech1 - AccChangeTech2)

    YDR035W4ase413055YDR035W4random41028833.3333YDR035W4naive413055Ase-RandomAse -NaiveRandom -Naive

    YDR354W4ase4280100YDR354W4random41463100YDR354W4naive413088.3333

    YER090W4ase427059.7531YER090W4random41991785.3704YER090W4naive413059.7531Gene(1 - 0)(2 - 1)(3 - 2)(4 - 3)(5 - 4)Gene(1 - 0)(2 - 1)(3 - 2)(4 - 3)(5 - 4)Gene(1 - 0)(2 - 1)(3 - 2)(4 - 3)(5 - 4)

    YGL026C4ase427079.2593YGL026C4random4117974.8485YGL026C4naive413079.2593YBR166C-2.855525-8.8636259.6296258.9876750YBR166C009.62962512.722250YBR166C2.8555258.86362503.7345750

    YKL211C4ase427074.321YKL211C4random41071495.3333YKL211C4naive413074.321YDR007W-3.615318.64767512.5139253.0833250YDR007W015.92595-3.672853.0833250YDR007W3.6153-2.721725-16.18677500

    YNL316C4ase413042.7778YNL316C4random41103864.5556YNL316C4naive413042.7778YDR035W2.22222513.4152-0.555525-16.8874250YDR035W-0.755.13595-5.307-6.02485-0.387425YDR035W-2.972225-8.27925-4.75147510.862575-0.387425

    YBR166C1ase545963.5556YBR166C1random51030381.3333YBR166C1naive518269.3827YDR354W-1.797121.23466.3032564.02775YDR354W021.2346-2.561750.69135-2.916675YDR354W1.79710-8.865-5.30865-6.944425

    YDR007W1ase5319100YDR007W1random51062393.1481YDR007W1naive518288.3333YER090W-0.598575-1.98785-210YER090W010.12345-17.067910YER090W0.59857512.1113-15.067900

    YDR035W1ase531946.6667YDR035W1random5728100YDR035W1naive518255YGL026C5.918525-2.380602.6666750YGL026C03.703700.8148250YGL026C-5.9185256.08430-1.851850

    YDR354W1ase5319100YDR354W1random567046.6667YDR354W1naive518295.5556YKL211C4.56385-8.9797253.02082560YKL211C05.30865-2.39197560YKL211C-4.5638514.288375-5.412800

    YER090W1ase531984.4444YER090W1random51054584.4444YER090W1naive518296.1111YNL316C-5.254475-1.2646256.419751.555550YNL316C006.419755.2901250YNL316C5.2544751.26462503.7345750

    YGL026C1ase547986.6667YGL026C1random51004981.1111YGL026C1naive518286.6667

    YKL211C1ase518288.3333YKL211C1random51042595.5556YKL211C1naive518288.3333Sum-1.41637529.8210535.3318512.40584.02775Sum-0.7561.4323-14.952123.577025-3.3041Sum0.66637531.61125-50.2839511.171225-7.33185

    YNL316C1ase527082.2222YNL316C1random51113669.3827YNL316C1naive518269.3827

    YBR166C2ase5218100YBR166C2random5237878.5185YBR166C2naive518269.3827Mean-0.1770468753.727631254.416481251.5507250.50346875Mean-0.093757.6790375-1.86901252.947128125-0.4130125Mean0.0832968753.95140625-6.285493751.396403125-0.91648125

    YDR007W2ase518288.3333YDR007W2random51172387.2222YDR007W2naive518288.3333

    YDR035W2ase518255YDR035W2random51035947.2222YDR035W2naive518255StDev4.017230198312.17648273645.17775028127.91709926361.424024669StDev0.26516504297.59289673668.02895296415.40243555231.0206760074StDev4.08932520767.73736717476.58414747034.80923540462.4394271476

    YDR354W2ase531988.3333YDR354W2random51979646.6667YDR354W2naive518288.3333

    YER090W2ase547980YER090W2random5986985.3704YER090W2naive518296.1111sqrt(8)2.8284271247Ttest-0.12465409220.86587675332.41257200230.55400500951Ttest-12.8605153883-0.65841283111.5429591058-1.1445118229Ttest0.05761320721.4444531798-2.70013104890.8212582965-1.0626267029

    YGL026C2ase547986.6667YGL026C2random51011480.4444YGL026C2naive518279.2593

    YKL211C2ase5319100YKL211C2random52900574.4444YKL211C2naive518288.3333

    YNL316C2ase527082.2222YNL316C2random5167564.5455YNL316C2naive518254.4444confidence fig for 7 degrees freedom1.895confidence fig for 7 degrees freedom1.895confidence fig for 7 degrees freedom1.895

    YBR166C3ase527082.2222YBR166C3random51005464.5556YBR166C3naive518269.3827

    YDR007W3ase5319100YDR007W3random532851.1111YDR007W3naive518288.3333

    YDR035W3ase518255YDR035W3random5983117.3333YDR035W3naive518255

    YDR354W3ase5319100YDR354W3random5103775.9259YDR354W3naive518274.321

    YER090W3ase537459.7531YER090W3random51042547.1111YER090W3naive518259.7531

    YGL026C3ase537479.2593YGL026C3random51101281.2963YGL026C3naive518279.2593

    YKL211C3ase537474.321YKL211C3random556077.2222YKL211C3naive518274.321

    YNL316C3ase537469.3827YNL316C3random51948578.5185YNL316C3naive518269.3827

    YBR166C4ase5218100YBR166C4random5344100YBR166C4naive518254.4444

    YDR007W4ase5319100YDR007W4random598446.6667YDR007W4naive518274.321

    YDR035W4ase518255YDR035W4random51028833.3333YDR035W4naive518255

    YDR354W4ase5319100YDR354W4random51710100YDR354W4naive518288.3333

    YER090W4ase537459.7531YER090W4random52057985.3704YER090W4naive518259.7531

    YGL026C4ase537479.2593YGL026C4random51060674.8485YGL026C4naive518279.2593

    YKL211C4ase537474.321YKL211C4random51077695.3333YKL211C4naive518274.321

    YNL316C4ase518242.7778YNL316C4random51128564.5556YNL316C4naive518242.7778

  • Time and MoneyCost is a positive function of time & money. ASE dominates for both, therefore ASE dominates for any reasonable cost function.

    For example: to achieve an accuracy of ~70%, ASE requires fewer trial iterations, and a hundredth of the price, of Random; and almost half the number of iterations, and a third of the price, of Nave.

    King et al. (2004) Nature. 427, 247-252.

  • Human ComparisonsWe were interested to compare the performance of the Robot Scientist with that of humans.We adopted the simulator to allow humans to chooses and interpret the results of cycles of experimentation.Compared nine graduate computer scientists and biologists.

    No significant difference between the best humans and the Robot

  • Robotic Annotation

  • New Biological KnowledgeSo far with the Robot Scientist we have only shown that we can automatically rediscover known biological knowledge.

    We wish to extend this result to the discovery of new biological knowledge.

    To do this we need to combine the robot scientist with conventional genome annotation bioinformatics, and DMP.

  • Robotic AnnotationOne way of thinking about genome annotation is as a hypothesis formation process. Hypothesis formation is perhaps the hardest part of automating science.

    Our idea is to incorporate bioinformatic annotation methods with genome annotation. The bioinformatic methods will generate the hypotheses which the robot scientist will experimentally test.

  • Genome Scale Model of Yeast MetabolismWe have extended our model of aromatic amino acid metabolism to cover most of what is known about yeast metabolism.Includes 1,166 ORFs (940 known, 226 inferred)Growth if path from growth medium to defined end-points.83% accuracy (based on 914 strain/medium predictions)

  • The Model is IncompleteIt is not possible to find a path from the inputs (growth medium) to all the end-point metabolites using only reactions encoded by known genes.This suggests automated strategies for determining the identity of the missing genes - new biological knowledge.One strategy is based on using EC enzyme class of missing reactions, identify genes that code for this EC class in other organism, then find homologous genes in yeast.The predictions can be tested automatically by robot.

  • Confirmation of DMPYeast PredictionsThe yeast gene YBR147W, of currently unknown function.It is predicted to have a function in metabolism by 2 DMP rules with expected accuracies of >80%.It is predicted to have a function in amino-acid metabolism with two rules with expected accuracies of 50% and 60% respectively.Using our robot scientist auxotrophic methodology we have recovered growth of the knockout with: aspartic acid, tyrosine, leucine, valine, phenylalanine, cystine, arginine.

  • ConclusionsMachine learning can be used to accurately predict gene function.

    Simple forms of scientific reasoning and experimentation can be fully automated.

    To develop robotic systems capable of generating new biological knowledge will require a synthesis of traditional genome annotation techniques, machine learning, and a Robot Scientist like methodology.

  • The Three Objects of the Intellect The True The Beautiful The Beneficial

  • AcknowledgementsDMPAndreas KarwathAberystwythAmanda ClareAberystwythPaul WiseAberystwythLuc DehaspeLeuven

    Robot ScientistKen WhelanAberystwythPhilip ReiserAberystwythFfion JonesAberystwythUgis Sarkans Aberystwyth (EBI)Douglas KellManchester (Aberystwyth) Steve OliverManchesterStephen MuggletonImperial College (York)Chris BryantRobert Gordons (York) David PageWisconsin

    BBSRC, EPSRCPharmDM - Commercial Support

  • Relational vs PropositionalPropositional: single table, fixed number of columns/attributesRelational: multiple tables, multiple values

  • Expression Data RuleIf in the micro-array experiment (sorbitol incubation) the ORF expression is > -0.25 and in the micro-array experiment (nitrogen depletion) the ORF expression is -1.06then the function of this ORF is pheromone response, mating type determination, sex-specific proteins"Accuracy on training data: 11/12 (92%)Accuracy on the test data: 3/4 (75%)21 predictions made

  • Structure Rule80% accurate on test dataMost matching ORFs belong to the Mitochondrial Carrier FamilyThese have 6 long transmembrane alpha-helices of about 20-30 amino acidsWhy do we notice alpha-helices of length 10-14?

  • AlignmentYJL133W -------NEYNPLIHCLC----GSISGSTCAAITTPLDCIKTVLQIRG------------ 251YKR052C -------NSYNPLIHCLC----GGISGATCAALTTPLDCIKTVLQVRG------------ 241YIL006W ----NNTNSINLQRLIMA----SSVSKMIASAVTYPHEILRTRMQLKS------------ 310YBR104W ----LTRNEIPPWKLCLF----GAFSGTMLWLTVYPLDVVKSIIQNDD------------ 271YGR096W ----KTTAAHKKWELATLNHSAGTIGGVIAKIITFPLETIRRRMQFMNSKHLEK------ 250YJR095W -----QMDVLPSWETSCI----GLISGAIGPFSNAPLDTIKTRLQKDK------------ 246YKL120W -----LMKDGPALHLTAS-----TISGLGVAVVMNPWDVILTRIYNQK------------ 261YLR348C -----FDASKNYTHLTAS-----LLAGLVATTVCSPADVMKTRIMNGS------------ 239YMR166C ----DGRDGELSIPNEILT---GACAGGLAGIITTPMDVVKTRVQTQQPPSQSNKSYSVT 300YDL198C ------DYSQATWSQNFIS---SIVGACSSLIVSAPLDVIKTRIQNRN------------ 242YGR257C ----RFASKDANWVHFINSFASGCISGMIAAICTHPFDVGKTRWQISMMN---------- 302YDL119C FIHYNPEGGFTTYTSTTVNTTSAVLSASLATTVTAPFDTIKTRMQLEP------------ 255

    YJL133W -SQTVSLEIMRKADTFSKAASAIYQVYGWKGFWRGWKPRIVANMPATAISWTAYECAKHF 310YKR052C -SETVSIEIMKDANTFGRASRAILEVHGWKGFWRGLKPRIVANIPATAISWTAYECAKHF 300YIL006W -DIPDSIQRR-----LFPLIKATYAQEGLKGFYSGFTTNLVRTIPASAITLVSFEYFRNR 364YBR104W -LRKPKYKNS-----ISYVAKTIYAKEGIRAFFKGFGPTMVRSAPVNGATFLTFELVMRF 325YGR096W FSRHSSVYGSYKGYGFARIGLQILKQEGVSSLYRGILVALSKTIPTTFVSFWGYETAIHY 310YJR095W ---SISLEKQSGMKKIITIGAQLLKEEGFRALYKGITPRVMRVAPGQAVTFTVYEYVREH 303YKL120W ----GDLYKG-----PIDCLVKTVRIEGVTALYKGFAAQVFRIAPHTIMCLTFMEQTMKL 312YLR348C ----GDHQP------ALKILADAVRKEGPSFMFRGWLPSFTRLGPFTMLIFFAIEQLKKH 289YMR166C HPHVTNGRPAALSNSISLSLRTVYQSEGVLGFFSGVGPRFVWTSVQSSIMLLLYQMTLRG 360YDL198C ---FDNPESG------LRIVKNTLKNEGVTAFFKGLTPKLLTTGPKLVFSFALAQSLIPR 293YGR257C ---NSDPKGGNRSRNMFKFLETIWRTEGLAALYTGLAARVIKIRPSCAIMISSYEISKKV 359YDL119C ----SKFTNS------FNTFTSIVKNENVLKLFSGLSMRLARKAFSAGIAWGIYEELVKR 305

  • AlignmentYJL133W -------cccccaaaaaa----aaaaaaaaaaacccaaaaaaaaaacc------------ 251YKR052C -------cccccaaaaaa----aaaaaaaaaaacccaaaaaaaaaacc------------ 241YIL006W ----ccccccccaaaaaa----aaaaaaaaaaacccaaaaaaaaaacc------------ 310YBR104W ----ccccccccaaaaaa----aaaaaaaaaaacccaaaaaaaaaacc------------ 271YGR096W ----cccccccccccccbaaaaaaaaaaaaaaacccaaaaaaaaaacccccccc------ 250YJR095W -----cccccccaaaaaa----aaaaaaaaaaacccaaaaaaaaaccc------------ 246YKL120W -----ccccccaaaaaaa-----aaaaaaaaaacccaaaaaaaaaacc------------ 261YLR348C -----ccccccaaaaaaa-----aaaaaaaaaacccaaaaaaaaaacc------------ 239YMR166C ----cccccccccaaaaaa---aaaaaaaaaaacccaaaaaaaaaacccccccccccccc 300YDL198C ------cccccccaaaaaa---aaaaaaaaaaacccaaaaaaaaaacc------------ 242YGR257C ----ccccccccccccaaaaaaaaaaaaaaaaacccaaaaaaaaaacccc---------- 302YDL119C ccccccccccccccaaaaaaaaaaaaaaaaaaacccaaaaaaaaaacc------------ 255

    YJL133W -ccccccccccccccaaaaaaaaaaaccccaaaaccaaaaaaacaaaaaaaaaaaaaaaa 310YKR052C -ccccccccccccccaaaaaaaaaaacccaaaaaccaaaaaaaccaaaaaaaaaaaaaaa 300YIL006W -ccccccccc-----aaaaaaaaaaaccccaaacccaaaaaaaccaaaaaaaaaaaaaaa 364YBR104W -ccccccccc-----aaaaaaaaaaacccaaaaaccaaaaaaaccaaaaaaaaaaaaaaa 325YGR096W cccccccccccccccaaaaaaaaaaacccaaaaaccaaaaaaaccaaaaaaaaaaaaaaa 310YJR095W ---ccccccccccccaaaaaaaaaaacccaaaaaccaaaaaaaccaaaaaaaaaaaaaaa 303YKL120W ----cccccc-----aaaaaaaaaaacccaaaaaccaaaaaaaccaaaaaaaaaaaaaaa 312YLR348C ----ccccc------aaaaaaaaaaacccaaaaaccaaaaaaaccaaaaaaaaaaaaaaa 289YMR166C cccccccccccccccaaaaaaaaaaacccaaaaaccaaaaaaaccaaaaaaaaaaaaaaa 360YDL198C ---cccccca------aaaaaaaaaacccaaaaacccaaaaaaaaaaaaaaaaaaaaaaa 293YGR257C ---ccccccccccccaaaaaaaaaaacccaaaaaccaaaaaaaccaaaaaaaaaaaaaaa 359YDL119C ----ccccca------aaaaaaaaaacccaaaaacccaaaaaaccaaaaaaaaaaaaaaa 305

  • Types of LogicDeductionRule: If a cell grows, then it can synthesise tryptophan.Fact: cell cannot synthesise tryptophanCell cannot grow.Given the rule P Q, and the fact Q, infer the fact P (modus tollens)

    AbductionRule: If a cell grows, then it can synthesise tryptophan.Fact: Cell cannot grow.Cell cannot synthesise tryptophan.Given the rule P Q, and the fact P, infer the fact Q