rna-seq: quantification and models for assessing ... · quantification and models for assessing...

118
RNA-seq: quantification and models for assessing differential expression (at least for some approaches) Ian Dworkin NGS2016 @IanDworkin

Upload: haque

Post on 02-May-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

RNA-seq:quantificationandmodelsfor

assessingdifferentialexpression(atleastforsomeapproaches)

IanDworkinNGS2016

@IanDworkin

Page 2: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Whatwewillcovertoday• Absolutefundamentalsofexperimentaldesign• Whyweusecountdataasinput• IntroducingabitofprobabilitytowhymanyRNADifferentialanalysistoolsuseanegativebinomial.

• Whydocareaboutvariance/over-dispersionsomuch.• Howdoweestimateover-dispersionwithsmallsamplesizes(andwhyedgeR andDGEgivedifferentresults).

• Abitaboutdealingwithmultiplecomparisons(ifwehavetime).

Page 3: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

GoalsIamnotplanningontryingtoprovideanysortofoverviewofstatisticalmethodsforgenomicdata.InsteadIamgoingtoprovideafewshortideastothinkabout.

Statistics(likebioinformatics)isarapidlydevelopingarea,inparticularwithrespecttogenomics.Rarelyisitclearwhatthe“rightway”toanalyzeyourdatais.

InsteadIhopetoaidyouinusingsomecommonsensewhenthinkingaboutyourexperimentsforusinghighthroughputsequencing.

Page 4: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Caveats

• Therearewholecoursesonproperexperimentaldesignandstatistics.Greatbookstoo.ThismaterialinBio720isnotenough!

• ForexperimentaldesignIhighlyrecommend:– Quinn&Keough:ExperimentalDesignanddataanalysisforbiologists.

http://www.amazon.com/Experimental-Design-Data-Analysis-Biologists/dp/0521009766/

Page 5: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Thebasicsofexperimentaldesign

• Thereareafewbasicpointstoalwayskeepinmind:– Biologicalreplication(asmuchasyoucanafford)isextremelyimportant.Torobustlyidentifydifferentiallyexpressed(DE)genesrequiresstatisticalpowers.• (note:thisisnothowmanyreadsyouhaveforagenewithinasample,buthowmanybiologically/statisticallyindependentsamplespertreatment).

– Technicalreplicationdoesnothelpwithstatisticalpower(i.e.don’tsplitasinglesampleandrunastwolibraries).

Page 6: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Biologicalreplicationgivesfarmorestatisticalpowerthanincreasedsequencingdepthwithin

abiologicalsample!!!!

• Sequencing(andlibraryprep)costsarestillsufficientlyexpensivethatmostexperimentsusesmallnumbersofbiologicalreplicates.

• Giventheadditionalcostsoflibrarycosts(~225$/sampleatourfacility),manyfolksgoforincreaseddepthinsteadofmoresamples.

• Foragivenlevelofsequencingdepth(total)foratreatment,itisfarbettertogoformorebiologicalreplicates,eachatlowersequencingdepth(ratherthanfewerreplicatedathighersequencingdepth).

Page 7: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Biologicalreplicationgivesfarmorestatisticalpowerthanincreasedsequencingdepthwithinabiological

sample!!!!

Roblesetal.2012

Page 8: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Howdothemethodscompareinsimulation?

Kvam etal.2012

Page 9: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Thebasicsofexperimentaldesign

• Thereareafewbasicpointstoalwayskeepinmind:– Biologicalreplication.– Designyourexperimenttoavoidconfoundingyourdifferenttreatments(sex,nutrition)witheachotherorwithtechnicalvariables(lanewithinaflowcell,betweenflowcellvariation).• Makediagrams/tablesofyourexperimentaldesign,orusearandomizeddesign.

Page 10: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Thebasicsofexperimentaldesign

• Thereareafewbasicpointstoalwayskeepinmind:– Biologicalreplication.– Designexperimenttoavoidconfounding variables.– Sampleindividuals(withintreatment)randomly!

Page 11: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Usefulreferences

PaulL.AuerandR.W.Doerge 2010.StatisticalDesignandAnalysisofRNA-SeqData.Genetics.10.1534/genetics.110.114983PMID:20439781

Bullard,J.H.,Purdom,E.,Hansen,K.D.,&Dudoit,S.(2010).EvaluationofstatisticalmethodsfornormalizationanddifferentialexpressioninmRNA-SeqexperimentsBMCBioinformatics,11,94.doi:10.1186/1471-2105-11-94

Page 12: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Designingyourexperimentbeforeyoustart.

Sampling

Replication

Blocking

Randomization

OverallwearegoingtobethinkingabouthowtoavoidConfoundingsourcesofvariationinthedata.

AllofthesearelargertopicsthatarepartofExperimentalDesign.

Page 13: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Sampling

Sampling

Replication

Blocking

Randomization

Samplingdesignisallaboutmakingsurethatwhenyou“pick”(sample)observations,youdosoinarandom andunbiasedmanner.

Propersamplingaimstocontrolforunknownsourcesofvariationthatinfluencetheoutcomeofyourexperiments.

Thisseemsreasonable,andoftenintuitivetomostexperimentalbiologists,butitcanbeveryinsidious.Whiteboard…

Page 14: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Sampling

Sampling

Replication

Blocking

Randomization

Page 15: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

BiologicalreplicatesNottechnicalones.

• Thereislittlepurposeinusingtechnicalreplication(i.e.samesample,multiplelibrarypreps)fromagivenbiologicalsampleUNLESSpartofyourquestionrevolvesaroundit.

• Focusonbiologicalvariability.Whileyouareconfoundingsomesourcesoftechnicalandbiologicalvariability,wealreadyknowalotabouttheformer,andlittleaboutthelatter(inparticularforyoursystem).

Page 16: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Replication

Sampling

Replication

Blocking

Randomization

Imagineyouhaveanexperimentwithonefactor(sex),withtwotreatmentlevels(malesandfemales).

Youwanttolookforsexspecificdifferencesinthebrainsofyourcrittersbasedontranscriptionalprofiling,soyoudecidetouseRNA-seq.

Perhapsyouhavealimitedbudgetsoyoudecidetorunonesampleofmalebrains,andonesampleoffemalebrains,eachinonelaneofaflowcell.

What(useful)informationcanyougetoutofthis?

Notmuch(buttheremaybesome).Why?

Page 17: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Replication

Sampling

Replication

Blocking

Randomization

Why?

Noreplication.Howwillyouknowifthedifferencesyouobserveareduetodifferencesinmalesandfemales,random(biological)differencesbetweenindividuals,ortechnicalvariationduetoRNAextraction,processingorrunningthesamplesondifferentlanes.

Allofthesesourcesofvariationareconfounded,andtherearenoparticularlygoodwaysofseparatingthemout.

Buttherearelotsofsourcesofvariation,sohowdoweaccountforthese?

Page 18: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Replication

Sampling

Replication

Blocking

Randomization

Todate,severalstudieshavesuggestedthat“technical”replicatesforRNA-seq showverylittlevariation/highcorrelation.

Mortazavi etal.2008

Howmightsuchastatementbemisleadingaboutvariation?

Page 19: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Replication

Sampling

Replication

Blocking

Randomization

Thisstudylookedatasinglesourceoftechnicalvariation.

Runningexactlythesamesampleontwodifferentlanesonaflowcell.

Thiscompletelyignoresothersourcesof“technicalvariation”variationduetoRNApurificationvariationduetofragmentation,labeling,etc..lanetolanevariationflowcelltoflowcellvariation

Allofthesemaybeimportant(althoughunlikelyinteresting)sourcesofvariation…

However…..

Page 20: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Replication

Sampling

Replication

Blocking

Randomization

ManystudieshaveignoredtheBIOLOGICALSOURCESofVARIATIONbetweenreplicates.Inmostcasesbiologicalvariationbetweensamples(fromthesametreatment)aregenerallyfarmorevariablethantechnicalsourcesofvariation.

Whileitwouldbenicetobeabletopartitionvarioussourcesoftechnicalvariation(suchaslabeling,RNAextraction),itoftentooexpensivetoperformsuchadesign(seewhiteboard).

IFyouhavelimitedresources,itisgenerallyfarbettertohavebiologicalreplication(independentbiologicalsamplesforagiventreatment)thantechnicalreplication.

Doestheseleadtoconfoundedsourcesofvariation?

Page 21: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Blocking

Sampling

Replication

Blocking

Randomization

Blocksinexperimentaldesignrepresentsomefactor(usuallysomethingnotofmajorinterest)thatcanstronglyinfluenceyouroutcomes.Moreimportantlyitisafactorwhichyoucanusetogroupotherfactorsthatyouareinterestedin.

Forinstanceinagriculturethereisoftenplottoplotvariation.Youmaynotbeinterestedintheplotthemselvesbutinthevarietyofcropsyouaregrowing.

Butwhatwouldhappenifyougrewallofstrain1onplot1andallofstrain2onplot2?

Whiteboard.

Theseplotswouldrepresentblockinglevels

Page 22: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Blocking

Sampling

Replication

Blocking

Randomization

Ingenomicstudiesthemajorblockinglevelsareoftentheslide/chipformicroarrays(i.e.twosamples/slidefor2colorarrays,16arrays/slideforIllumina arrays).

ForGAII/HiSeq RNA-seq datathemajorblockingeffectistheflowcellitselfandlaneswithintheflowcell.

AuerandDoerge 2010

Page 23: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Blocking

Sampling

Replication

Blocking

Randomization

Incorporatinglanesasablockingeffect

AuerandDoerge 2010

Page 24: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Blockingdesigns

Sampling

Replication

Blocking

Randomization

BalancedIncompleteBlockingDesign(BIBD)

Let’sdissectthesesubscripts.

Balancedfortreatmentsacrossflowcells..Randomizedforlocation AuerandDoerge 2010

Page 25: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Whatstandardtechnicalissuesshouldyouconsiderforblocking:

• FlowCell• Lane• Adaptors• Libraryprep• Sameinstrument• People!• RNAextraction/purification

Page 26: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Whathappenswhenyoufailtoblock(orreplicate)?

Page 27: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Yue F,ChengY,Breschi A,etal.:AcomparativeencyclopediaofDNAelementsinthemousegenome.Nature.2014;515(7527):355–364

LinS,LinY,Nery JR,etal.:Comparisonofthetranscriptionallandscapesbetweenhumanandmousetissues.ProcNatl Acad Sci USA.2014;111(48):17224–17229

Inarecentanalysisofthemod-encodedata,RNAseq datasuggestedthatclustering(forgeneexpression)morebyspeciesthanbytissue.Thiswasanunusualfinding.

Page 28: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Gilad YandMizrahi-ManO.AreanalysisofmouseENCODEcomparativegeneexpressiondata[v1;refstatus:indexed,http://f1000r.es/5ez]F1000Research2015,4:121(doi:10.12688/f1000research.6536.1)

Anewre-analysisdemonstratedsomepotentiallyseriousissueswiththeexperimentaldesign

Page 29: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Figure1.Studydesign for:Yue F,ChengY,Breschi A,etal.:AcomparativeencyclopediaofDNAelementsinthemousegenome.Nature.

2014;515(7527):355–364LinS,LinY,Nery JR,etal.:Comparisonofthetranscriptionallandscapesbetweenhumanandmousetissues.

ProcNatl Acad Sci USA.2014;111(48):17224–17229

GiladYandMizrahi-ManO2015[v1;refstatus:awaitingpeerreview,http://f1000r.es/5ez]F1000Research2015,4:121(doi:10.12688/f1000research.6536.1)

Page 30: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Differentialexpression

• ProbablythesinglemostcommonuseofRNA-Seq dataisexaminedifferentialexpressionoftranscripts(transcriptionalprofiles).

Page 31: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Differentialexpression

• Butdifferentialexpressionofwhat?

Page 32: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Differentialexpression

• Butdifferentialexpressionofwhat?– Genes– Transcripts(alternativetranscripts)– Allelespecificexpression– Exon levelexpression

Page 33: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Yourprimarygoalsofyourexperimentshouldguideyourdesign.

• Theexactdetails(#biologicalsamples,sampledepth,read_length,strandspecificity)ofhowyouperformyourexperimentneedstobeguidedbyyourprimarygoal.

• Unlessyouhaveallthe$$,nosingledesigncancaptureallofthevariability.

Page 34: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Yourgoalsmatter

• Forinstance:Ifyourprimaryinterestindiscoveryofnewtranscripts,samplingdeeplywithinasampleisprobablybest.

• Fordifferentialexpressionanalyses,youwillalmostneverhavetheabilitytoperformDifferentialexpressionanalysisonveryraretranscripts,soitisrarelyusefultogeneratemorethan15-20millionreadpairsperbiologicalsample.

Page 35: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Asimpletruth:Thereisnotechnologynorstatistical

wizardrythatcansaveapoorlyplannedexperiment.Theonlytrulyfailedexperimentisapoorlyplanned

one.

Toconsultthestatisticianafteranexperimentisfinishedisoftenmerelytoaskhim(her)toconductapostmortemexamination.He(she)canperhapssaywhattheexperimentdiedof.

RonaldFisher

Page 36: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In
Page 37: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Counting

• Oneofthemostdifficultissueshasbeenhowtocount.

• Wefirstneedtoaskwhatfeatures wewanttocount.

Page 38: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

WhatFeaturescouldwecount?

Page 39: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

WhatFeaturescouldwecount?

• Countingatthelevelofgenes(readsmappedtogeneregardlessoftranscript).

• Countingattheleveloftranscript.• Countingatthelevelofexons.• Countingatthelevelofkmers withinoneoftheabove

• Countingatthelevelofnucleotideswithinexon/transcript/gene.

Page 40: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Counting

• Weareinterestedintranscriptabundance.• Butweneedtotakeintoaccountanumberofthings.

Page 41: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Counting

• Weareinterestedintranscriptabundance.• Butweneedtotakeintoaccountanumberofthings.

• Howmanyreadsinthesample.• Lengthoftranscripts• GCcontentandsequencingbias(influencingcountsoftranscriptswithinasample).

Page 42: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

SeeminglysensibleCounting(butultimatelynotsouseful).

• RPKM(readsalignedperkilobase ofexon permillionreadsmapped)– Mortazavi etal2008

• FPKM(fragmentsperkilobase ofexon permillionfragmentsmapped).Sameideaforpairedendsequencing.

• TPM,TMM…etc…

Page 43: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Takehomemessage(fromme):Actualcountsshouldbeusedasinputfordifferentialexpressionanalysis,not

(pre)scaled measures.

Page 44: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

BUT:Noteveryoneagreeswiththisapproachthough.Norwithmyargumentsaboutcounting.

Lior Patcher’s blogisagoodplacetowatchthedebate.Alsocheckoutsomecommentsinthevignetteandpaperonlimma/voom.

Page 45: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

RPKM

Page 46: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

ProblemswithRPKM

• RPKMisnotaconsistentmeasureofexpressionabundance(orrelativemolarconcentration).

• See– http://blog.nextgenetics.net/?e=51– Wagneretal2012MeasurementofmRNAabundanceusingRNA-seq data:RPKMmeasureis

inconsistentamongsamples.TheoryBiosci

Page 47: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

HowaboutTranscriptspermillion(TPM)

WhileTPMisingeneralmore(statistically)consistent,itisstillgenerallynotappropriate.

Page 48: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Normalization(forDE)canbemuchmorecomplicatedinpractice

• Whymightscalingbytotalnumberofreads(sequencingdepth)beamisleadingquantitytoscaleby?

Page 49: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Normalization(forDE)canbemuchmorecomplicatedinpractice

• Scalingbytotalmappedreads(sequencingdepth)canbesubstantiallyinfluencedbythesmallproportionofhighlyexpressedgenes.

(Whatmighthappen?)

• Anumberofalternativeshavebeenproposedandused(i.e.usingquantile normalization,etc..)

Bullard,J.H.,Purdom,E.,Hansen,K.D.,&Dudoit,S.(2010).EvaluationofstatisticalmethodsfornormalizationanddifferentialexpressioninmRNA-Seq experiments.BMCBioinformatics,11,94.doi:10.1186/1471-2105-11-94

Page 50: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Counting(andnormalizing)inpractice

• Inpractice,wedonotwantto“pre-scale”ourdataasisdoneinF/R-PKMorTPM.

• Insteadwearefarbetteroffusingamodelbasedapproachfornormalizingforread-lengthorlibrarysizeinthedatamodelingperse.

• Thisisfarmoreflexible.

Page 51: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Takehomemessage:Actualcountsshouldbeusedasinputfordifferentialexpressionanalysis,not

(pre)scaled measures.

Theissueisthatgettingunambiguouscountsishard(Rob).

Page 52: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

DifferentialExpressionanalysis.APrimer.

• Iamassumingthatwehavealreadydecidedonanappropriatemethodtocountandconvertmappedreadstodiscretevalues…

• Thereisabitweneedtoknowtohelpusunderstandwhattodonext.

Page 53: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Abitofbackgroundonprobability.• Fundamentallyourobservedmeasureofexpressionarethecountsofreads.

• Dependinguponthedatamodelingframeworkwewishtouse,weneedtoaccountforthis,asthesearenotnecessarilyapproximatedwellbynormal(Gaussian)distributionsthatareusedfor“standard”linearmodelsliket-tests,ANOVA,regression.

• Thisisnotaproblematall,asitiseasytomodeldatacomingfromotherdistributions,andiswidelyavailableinstatspackagesandprogramminglanguagesalike.

Page 54: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

ProbabilityDensityvs.Massfunction

ProbabilityMassfunctionforadiscretevariable.

ProbabilityDensityfunctionforacontinuousvariable.

Page 55: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

ProbabilityMassfunction(Fordiscretedistributions,likeread

counts)

P(13|Poisson(l=10))=0.073

Heightrepresentstheprobabilityatthatpoint(integer).

“Area”oftheboxhasnoparticularmeaning.

P(integer)≥0P(non-integers)=0.

Page 56: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

ProbabilityDensityfunction

Heightatx=13is0.0799Thisisnottheprobabilityatx=13,butthedensity.i.e.f(13)=0.0799,wheref(x)isthenormaldistribution.

P(x=13|N(mean=10,sd=3.3))=0WHY?

Page 57: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

ProbabilityDensityfunction

Wecandefinetheprobabilityintheinterval10≤x≤15

P(10≤x≤15|N(10,3.3))=0.435

Page 58: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Clarificationsoncontinuousdistributions.

AREAUNDERCURVEOFPDF=1

(Theintegralofthenormal)

Page 59: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Bolker 2007CH4page137

Page 60: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Themultitudeofprobabilitydistributionsallowustotochoose

thosethatmatchourdataortheoreticalexpectationsintermsof

shape,location,scale.

Page 61: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Fittingadistributionisanartandscienceofutmostimportanceinprobabilitymodeling.Theideaisyouwantadistributiontofityourdatamodel“justright”withoutafitthatis“overfit”(orunderfit).Overfittingmodelsissometimesaprobleminmoderndataminingmethodsbecausethemodelsfitcanbetoospecifictoaparticulardatasettobeofbroaderuse.

Seefeld2007

Page 62: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Sowhydoweusethem?It’sallaboutshapeandscale!

• Becausetheyprovideausableframeworkforframingourquestions,andallowingforparametricmethods;i.elikelihoodandBayesian.

• Evenifwedonotknowitsactualdistribution,itisclearfrequencydataisgenerallygoingtobebetterfitbyabinomialthananormaldistribution.Why?

Page 63: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Whywillitbeabetterfit?

• Thebinomialisbounded byzeroand1• Otherdistributions(gamma,poisson,etc)havealowerboundaryatzero.

• Thisprovidesaconvenientframeworkfortherelationshipbetweenmeansandvarianceasoneapproachestheboundarycondition.

Page 64: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Somediscretedistributions(leadinguptowhywemaywantto

usethenegativebinomial)

BinomialPoisson

Negative-binomial

Page 65: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Randomvariables

• Thisiswhatwewanttoknowtheprobabilitydistributionof.

• I.e.P(x|somedistribution)

Iwilluse“x”tobetherandomvariableineachcase.

Page 66: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

BinomialLet’ssayyousetupaseriesofenclosures.Withineachenclosureyouplace25flies,andapre-determinedsetofpredators.Youwanttoknowwhatthedistribution(acrossenclosures)offliesgettingeatenis,basedonapre-determinedprobabilityofsuccessforagivenpredatorspecies.

Youcansetthisupasabinomialproblem.

N(Rcallsthissize)=25(thetotal#ofindividualsor“trials”forpredation)intheenclosurep=probabilityofasuccessfulpredation“trial”(thecointoss)x=#trialsofsuccessfulpredation.Thisiswhatweusuallywantfortheprobabilitydistribution.

Page 67: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Binomial

Youcanthinkofthisintwoways.A)Anormalizingconstantsothatprobabilitiessumto1.B)#ofdifferentcombinationstoallowforx“successful”predationeventsoutofNtotal.

Youwilloftenseex=kandhear“Nchoosek”

Page 68: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Example

• Ifpredatorspecies1hadaper“trial”probabilityofsuccessfullyeatingapreyitemof0.2,whatwouldbetheprobabilityofexactly10flies(outofthe25)beingeateninasingleenclosure.

P(x=10|bi(N=25,p=0.2))=0.0118

Notsohigh.Wecanlookattheexpectedprobabilitydistributionfordifferentvaluesofx.

Page 69: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Thiswouldbetheexpecteddistributionifwesetupmanyreplicateenclosureswith25fliesandthispredator.

Page 70: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Predatorspecies2ismuchhungrier….

Page 71: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Let’ssaywehad100fliesperenclosure,andpredatorspecies3was

reallyineffective,p=0.01

Whiletheremaybeatheoreticallimittothenumberoffliesthatcanbeeaten,practicallyspeakingitisunlimitedsincethepredationprobabilityissolow.

ThisisalotlikethesituationwehavewithRNA-seq data.

Page 72: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Poisson• Whenyouhaveadiscreterandomvariablewheretheprobabilityofa“successful”trialisverysmall,butthetheoretical(orpractical)rangeiseffectivelyinfinite,youcanuseapoisson distribution.

• Usefulforcounting#of“rare”events,likenewmigrantstoapopulation/year.

• #ofnewmutations/offspring..• #countsofsequencingreads(wellsortof)…

Page 73: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Poisson• It isalsoseeminingly usefulforRNA-Seqdata.(althoughwewillseenotveryusefulinpractice).

Page 74: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Poisson

x isourrandomvariable(#events/unitsamplingeffort)– readcountsforageneinasamplel Isthe“rate”parameter. i.e.Expectednumber ofreads(foratranscript)persamplel isthemeanandthevariance!!!!

ForitsrelationtoabinomialwhenNislargeandp issmalll=N*p

Page 75: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Poisson

• Let’ssayfliesdispersetocolonizeanewpatchataverylowrate(previousestimatessuggestwewillobserveoneflyforeverytwonewpatchesweexamine,l=0.5).

• Whatistheprobabilityofobserving2fliesonanewpatchofland?

P(x=2|poisson(l=0.5))=0.076

Page 76: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Probabilityofobservingxnumberoffliesonapatchgivenlambda=0.5

Page 77: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Whathappensaslambdaincreases?

0 1 2 3 4 5 6 7 8 9 11 13

! = 4 (expected # of reads for transcript x across samples)

# of reads for transcript x

prop

ortio

n of

sam

ples

for t

rans

crip

t x

0.00

0.05

0.10

0.15

4 7 10 14 18 22 26 30 34 38

! = 20

# of reads for transcript x

prop

ortio

n of

sam

ples

for t

rans

crip

t x

0.00

0.02

0.04

0.06

0.08

58 68 76 84 92 101 111 121 131 141

! = 100

# of reads for transcript x

prop

ortio

n of

sam

ples

for t

rans

crip

t x

0.00

0.01

0.02

0.03

0.04

Page 78: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Poissonmeanandvariance

• Whenlambdaissmallforyourrandomvariable,youwilloftenfindthatyourdatais“over-dispersed”.

• ThatisthereismorevariationthatexpectedunderPoisson(lambda).

• Similarlywhenlambdagetslarge,youwilloftenfindthatthereislessvariationthanexpectedunderPoisson(lambda).

Page 79: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

AndersandHuber2010GenomeBiology

Page 80: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Whypoisson mightnotmodelsequencereadswell

• MostRNA-Seq data(andmostcountdatainbiology)isnotmodeledwellbypoissonbecausetherelationshipsbetweenmeansandvariancestendtobefarmorecomplicatedamong(andwithin)biologicalreplicates.

• Ithasbeenargued(Mortzavi etal2008)thattechnicalvariationinRNA-Seq iscapturedbyPoisson.Ihavemydoubtsevenonthis.

Page 81: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Quasi-poisson

• Sinceover-dispersionissuchacommonissue,anumberofapproacheshavebeendevelopedtoaccountforitwithcountdata.

• Oneistouseaquasi-poisson.• Insteadofvariance(x)=λ,itis

• Variance(x)=λθ• Whereθ isthe(multiplicative)over-dispersionparameter.

Page 82: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Howaboutanormaldistribution?

• Despiteworkingwithdiscretecountdata,severalauthorsusenormaldistributions.Severalreasons.

Page 83: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Howaboutanormaldistribution?• Despiteworkingwithdiscretecountdata,severalauthorsuse

normaldistributions.Severalreasons:

1. Whenthemeannumberofcountsisfarenoughawayfromzero,oftenthenormaldistributiondoesagoodjoboffittingthedata(andcapturingmean&variancerelationship).Forlowmeancountsavariancestabilizationcanaidmodeling(theapproachusedinlimma/voom).

2. Ourresponsevariable(countsoffeatures)arenotmeasuredwithouterror,andthereforearenottruemeasures.Whenestimatingeffectsinourmodelweaccountforthisuncertaintyandassuminganormaldistributionenablesadditionalflexibility.

Page 84: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Negativebinomial

• InbiologytheNeg.Binomialismostlyusedlikeapoisson,butwhenyouneedmoredispersionofx (itneedstobespreadoutmore).

• Thenegativebinomial isaPoissondistributionwherelambdaitselfvariesaccordingtoaGammadistribution.

Page 85: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Negativebinomial

Expectednumberofcounts=μOver-dispersionparameter=k

Forourpurposesallwecareaboutisthat

Page 86: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

General(ized)linearmodels

• Forresponsevariablesthatarecontinuous,youarelikelyfamiliarwithapproachesthatcomefromthegenerallinearmodel.

Astandardlinearregression(ifx iscontinuous).Ifx isdiscretethiswouldbeat-test/Anova.

Page 87: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Generalizedlinearmodel

• MANYofthedifferentialexpressiontoolsutilizealinearmodelframework.

• Thusitisimportanttogetfamiliarwiththeframework.

• TheclassbyJonathanandBen(B)isprobablyagreatplacetostart.

Page 88: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

ContinuityofStatisticalApproaches

t-test

ANOVA

NumberofLevels:

MixedEffectsModel(randomorboth)FixedPredictors:

Regression(continuous)

ANCOVA(both)

GeneralLinearModel

Predictors:(discrete)

GeneralizedLinearModel(non-normal)Response:

(normal)

ProcessModels

Page 89: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Generalizedlinearmodels• Butwhatdoyoudowhenyourresponsevariableisnotnormallydistributed?

• Theframeworkofthelinearmodelcanbeextendedtoaccountfordifferentdistributionsfairlyeasily(onemajorclassoftheseisthegeneralizedlinearmodels).

Page 90: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

ContinuityofStatisticalApproaches

t-test

ANOVA

NumberofLevels:

MixedEffectsModel(randomorboth)FixedPredictors:

Regression(continuous)

ANCOVA(both)

GeneralLinearModel

Predictors:(discrete)

GeneralizedLinearModel(non-normal)Response:

(normal)

ProcessModels

Page 91: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Generalized LinearModels(GLiM)

• Inmanycasesagenerallinearmodel isnotappropriatebecausevaluesarebounded– e.g.counts>0,proportionsbetween0and1

• Ageneralizationoflinearmodelstoincludeanydistributionoferrorsfromtheexponentialfamilyofdistributions

• Normal,Poisson,binomial,multinomial,exponential,gamma,NOTnegativebinomial

• GeneralLinearModelisjustaspecialcaseofGLiMinwhichtheerrorsarenormallydistributed

• Example,logisticregression• Wewilluselikelihoodforparameterestimationandinference

Page 92: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

GeneralizationsofGLM

• Insteadofasimplelinearmodel:Y=b0 +b1x1+b2x2 +e

– Assumethate’sareindependent,normallydistributedwithmean0andconstantvariances2

– Cansolveforb’sbyminimizingsquarede’s

• GLiMconsiderssomeadjustmenttothedatatolinearizeY- alink function

Y=g(b0 +b1x1+b2x2 +e)or f(Y)=b0 +b1x1+b2x2 +e– Forexampleforcountdatawhicharealwayspositive

f(Y)=log(Y) loglink

Page 93: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Whatisalinkfunction?

• Thelinkfunctionisawayoftransformingtheobservedresponsevariable(LHS).

• Goals• 1)linearizeobservedresponse• 2)Altertheboundaryconditionsofthedata.• 3)Toallowforanadditivemodelinthecovariates(RHS)

Page 94: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

PoissonFamily

• Dataarecountsofsomething(i.e.0,1,2,3,4…)• Numberofoccurrencesofaneventoverafixedperiodoftimeorspace• Examples…

• Ifthemeanvalueishighthencountscanbelog-normalornormallydistributed• Whenmeanvalueislowthentherestartstobelotsofzerosandvariancedependson

themean• Ifupperendisalsoboundedthenbinomialwouldbebetter

• Defaultlinkisthelog link,variancefunction=µ– i.e.,family=poisson(link=“log”,variance=“mu”)– Otheroptionmightbethesqrt link

Page 95: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

PoissonandnegativebinomialFamily

Essentiallyitmeansyoucanlogtransformthesequencecountsanduseapoisson,quasi-poisson ornegativebinomialtofitit(mostlinksaremorecomplicated,thisisniceandsimple).

i.e.countsaremodeledas

Page 96: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Methodsusingnb glm• edgeR (butitisnotdefault,sobeware!)• DESeq/DESeq2(maybeDEXseq aswell?)• BaySeq• Limma (voom – kindofsortof…).

• Howevertheseallmodelthevariancequitedifferently(howtheyborrowinformationacrossgenestoestimatemean-variancerelationships).

SeeYu,Huber&Vitek 2013(Bioinformatics)fordiscussionofthisissue.

Page 97: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Methodsusingpoisson andquasi-poisson

• tspm (twostagepoisson model)– Fitsmodelswithpoisson first.Ifover-dispersedthenusesaquasi-poisson.

– Thusthereareessentiallytwogroupsofgenes.

Page 98: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Whythisisuseful• Sincewecanfittheseasageneralizedlinearmodel,wecanfitarbitrarilycomplexdesigns(ifwehavesufficientsamplesizestoestimatealltheparameters).

• Wecanincorporateallaspectsofreadlength,librarysize,lane,flowcellinadditiontoalloftheimportantbiologicalpredictors(yourtreatments).

• NOt-testsforyou!!!

Page 99: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Estimatingover-dispersion(variance)(orwhyprogramsseeminglydoingthe

samethinggivedifferentresults)

Page 100: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Variancesrequirelotsofdatatoestimatewell(notjustforcountdata)• Itturnsoutthattoestimatevariances,youneedalotmorereplicationthanyoudoformeans.

• HowevermostRNA-Seq experimentsstillhavesmallnumbersofbiologicalreplicates.

• Sohowtogoaboutestimatingvariances?

Page 101: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

IFsamplesizesarelarge(withinandbetweentreatments).

• Mostmethodsdowell(basedonNB,quasi-Pornon-parametricapproaches).

• Theycanmodelindividuallevelvariances(andpotentiallycanuseresamplingapproachestoavoidhavingtomakeparametricassumptions).

Page 102: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Butifsamplesizes(intermsofbiologicalreplication)issmall.

• Thenwehaveaproblem.• Thisiswherethesoftwarereallytendstodiffer,astheyallmake(different)assumptionsabouttheuncertaintyincounts,mean-variancerelationships,andhowbesttomodelsucheffects.

• InparticularedgeR andDEseq usesomemethodstoborrowinformationacrossgenes(andhaveoptionstochangethisprocess).

• Thiscandramaticallychangetheresults.Anders,S.,&Huber,W.(2010).Differentialexpressionanalysisforsequencecountdata.GenomeBiology,11(10),R106.doi:10.1186/gb-2010-11-10-r106

Andersetal(2013).Count-baseddifferentialexpressionanalysisofRNAsequencingdatausingRandBioconductor.NatureProtocols,8(9),1765–1786

Page 103: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

AndersandHuber2010

Page 104: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Yuetal(2013).ShrinkageestimationofdispersioninNegativeBinomialmodelsforRNA-seq experimentswithsmallsamplesize.Bioinformatics,29(10),1275–1282.

Page 105: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

AndersandHuber2010

Page 106: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Let’sthinkaboutthis.

Love,Huber&Anders2014BioRXiV doi:10.1101/002832

Page 107: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Wecanalso“shrink”estimatesbasedonover-dispersion….

Page 108: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Takehome

• Withsmallsamplesizes,themethodsusedifferentapproachestogetgene-wiseover-dispersion(basedonalldata).

• EdgeR ismorepowerful(moresignificanthits)thanDESeq generally.Butmuchmoresusceptibletofalsepositivesduetooutliers.

• DESeq2“should”besomewhereinthemiddle.

Page 109: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In
Page 110: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Biologicalreplicationgivesfarmorestatisticalpowerthanincreasedsequencingdepthwithin

abiologicalsample!!!!

• Sequencing(andlibraryprep)costsarestillsufficientlyexpensivethatmostexperimentsusesmallnumbersofbiologicalreplicates.

• Giventheadditionalcostsoflibrarycosts(~225$/sampleatourfacility),manyfolksgoforincreaseddepthinsteadofmoresamples.

• Foragivenlevelofsequencingdepth(total)foratreatment,itisfarbettertogoformorebiologicalreplicates,eachatlowersequencingdepth(ratherthanfewerreplicatedathighersequencingdepth).

Page 111: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Biologicalreplicationgivesfarmorestatisticalpowerthanincreasedsequencingdepthwithinabiological

sample!!!!

Roblesetal.2012

Page 112: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Howdothemethodscompareinsimulation?

Kvam etal.2012

Page 113: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Howdothemethodscompareinsimulation?

Kvam etal.2012

Page 114: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Howdothemethodscompareforrealdata?

Kvam etal.2012

Page 115: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Howdothemethodscompareinadifferentsetofsimulations?

Soneson 2012

WillexplainROC(receiveroperatorcurves)andtheareaundercurvesonboard.

Page 116: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

References• Robles,J.A.,Qureshi,S.E.,Stephen,S.J.,Wilson,S.R.,Burden,C.J.,&Taylor,J.M.(2012).Efficientexperimentaldesignand

analysisstrategiesforthedetectionofdifferentialexpressionusingRNA-Sequencing.BMCGenomics,13,484.doi:10.1186/1471-2164-13-484

• Bullard,J.H.,Purdom,E.,Hansen,K.D.,&Dudoit,S.(2010).EvaluationofstatisticalmethodsfornormalizationanddifferentialexpressioninmRNA-Seq experiments.BMCBioinformatics,11,94.doi:10.1186/1471-2105-11-94

• Kvam,V.M.,Liu,P.,&Si,Y.(2012).AcomparisonofstatisticalmethodsfordetectingdifferentiallyexpressedgenesfromRNA-seq data.AmericanJournalOfBotany,99(2),248–256.doi:10.3732/ajb.1100340

• Soneson,C.,&Delorenzi,M.(2013).AcomparisonofmethodsfordifferentialexpressionanalysisofRNA-seq data.BMCBioinformatics,14,91.doi:10.1186/1471-2105-14-91

• Wagner,G.P.,Kin,K.,&Lynch,V.J.(2012).MeasurementofmRNAabundanceusingRNA-seq data:RPKMmeasureisinconsistentamongsamples.Theoryinbiosciences=Theorie indenBiowissenschaften,131(4),281–285.doi:10.1007/s12064-012-0162-3

• Vijay,N.,Poelstra,J.W.,Künstner,A.,&Wolf,J.B.W.(2012).Challengesandstrategiesintranscriptomeassemblyanddifferentialgeneexpressionquantification.Acomprehensiveinsilico assessmentofRNA-seq experiments.MolecularEcology.doi:10.1111/mec.12014

Page 117: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Whydowecareaboutmultiplecomparisons?

Page 118: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In

Howcanwedealwithmultiplecomparisons