2016 bd2k investments in training report
TRANSCRIPT
January2016
BD2KInvestmentsinTraining
ExecutiveSummaryTrainingisamajorlimitingfactortoextractingknowledgefromdata,andthereforeitisasignificantpartoftheBigDatatoKnowledge(BD2K)Initiative.TheprimarygoalsofBD2Kintrainingaretoincreasethenumberofbiomedicaldatascientistsandtoimprovethedatascienceskillsofallbiomedicalscientists.Becausetrainingneedsvarygreatlybasedonanindividual’spriorbackgroundandintendedusefordatascience,theBD2Kinvestmentsintrainingarealsovaried.Forthoseindividualswhoareprimarilybiomedicalscientistsanddonotintendtobecomespecialistsindatascience,BD2Ksupportscoursesandeducationalresourcesthataremeanttoenableparticipantstobecomeconversantindatascienceandattainskillstoutilizedatasciencemethods.Tosupportthoseindividualswhowishtobecomespecialistsinbiomedicaldatascience,BD2Kincludestrainingprogramsforpredoctoralstudents,researchrotationsforearlycareerscientists,andcareerdevelopmentawardsforpostdocsandmoreseniorresearchers.Tofosterthedevelopmentofnewteamsconsistingofbiomedicalscientistsanddatascientists,BD2KissupportingtheQuBBD(QuantitativeApproachestoBiomedicalBigData)ProgramalongwiththeNationalScienceFoundation.AmongthefirstBD2Kawardsissued(inSeptember2014)weretrainingawards,andtheyarestartingtobearfruit.Forexample,thefirstBD2KOpenEducationalResources,intheformofMassiveOpenOnlineCourses,havebeenreleasedandalreadyboastthousandsofgraduates;threesummercourseswereofferedandfilledbeyondcapacity;andindividualssupportedbycareerdevelopmentawardshavetransitionedtomoresecurepositions.AlthoughtheexistingawardsmadeinFY14-16arepromising,additionalinvestmentsareneededtokeepupwiththefast-changingareaofdatascience.InorderforBD2K-supportedresourcestohavemaximalimpact,theyneedtobefindable,accessible,interoperable,andreusable(FAIR).Tohelpbiomedicalscientistsfindandaccessthemostappropriatedatascienceeducationalresources,theBD2KTrainingCoordinationCenter(TCC)isdevelopinganEducationalResourceDiscoveryIndex,workingwithinternationalpartners.ThroughtheTCCandtheothertrainingawards,BD2Kaimstoimprovetheabilityoftheentirebiomedicalsciencecommunity,whetherspecialistsinbiomedicalscienceordatascience,toutilizethegrowingvolumeandcomplexityofdata.
January2016
OverallGoalsFocusingontrainingwasoneofthemainrecommendationsfromtheJune2012DataandInformaticsWorkingGroupReport.ItisalsooneofthemajorthrustareasoftheBD2KprogramandtheADDSoffice.Trainingcurrentlyaccountsfor15%oftheBD2Kbudgetandisexpectedtoramptoabout20%.Theterm“Training”ismeanttoencompasstraining,education,andworkforcedevelopmentthatprovideslearners,nomatterwhatcareerlevel,eitherfoundationalknowledgeorskillsforimmediateuse.Therearetwomaingoalsfortraining,toimprovebigdataskillsinallbiomedicalscientistsandtoincreasethenumberofpeoplewhospecializeinbiomedicaldatascience.Thesetwogoalsincludesub-goalsaswell:
1) Toimprovebigdataskillsofallbiomedicalscientistsa. Supporttrainingopportunities,bothin-personandonlineb. Ensuretrainingopportunitiesandresourcesaremorereadily
discoveredandaccessedc. Enhancediversityinthebiomedicalandbiomedicaldatascience
workforces2) Toincreasethenumberofbiomedicaldatascientists
a. Establishbiomedicaldatascienceasacareerpathb. Fostercollaborationsbetweenbiomedicalscientistsanddata
scientists
Toaccomplishthesegoals,thetrainingportfolioisdiverse.AlthoughtenFOAswereissuedinFY14andFY15,theyclusterintofivegroups:
• R25awardsforcoursesandresourcesaboutdatamanagementanddatascience(4FOAswereissuedtoallowfordifferentbudgetarycategoriesandstructures)
January2016
• R25awardstoenhancediversity(denoteddR25todistinguishitfromtheotherR25s)
• T32/T15trainingprograms(3FOAStosupportbothnewprogramsandsupplementstoexistingT32sandT15s)
• K01CareerDevelopmentaward• U24TrainingCoordinationCenter(TCC)
Atotalof67awardswereissuedbyBD2KinthesefiveFOAclusters.Inaddition,eachoftheBD2KCentersofExcellencehastrainingcomponents.BecausetheCenters’trainingisdiverseandmainlyfocusedontrainingaboutspecifictoolsdevelopedbytheCenters,theyarenotincludedinthisreport.Collectively,thefiveBD2KtrainingFOAclustersreachanaudienceofvaryingexperiencelevelsandintentions,fromundergraduatestoseniorfacultyandinstructorswhowillbeteachingdatascience.Someoftheawardsaretargetedatparticularcareerlevels.Forexample,thedR25awardsareforundergraduates,theT32/T15programsareforpredoctoraltrainees,andtheK01awardsareforpostdocsandbeyond.Otherawards,suchastheU24TCCandtheR25sareforabroaderrangeofexperiencelevels.
Eachclusteraddressesoneormoregoals.
January2016
BecauseBD2Kisatrans-NIHprogram,withfundscomingfromallICsandtheCommonFund,ICinputhasbeenactivelysolicited.TheFOAsweredevelopedbyatrans-NIHgroupofprogramdirectors,firstcalledthe“BD2KTrainingSubcommittee”andlaterreferredtoasthe“BD2KTrainingProgramManagementGroup”.Thisgrouphaswelcomedall-comers,fromallICsandtheNIH/OD,withefforttorecruitmembersthroughtwopresentationstotheTAC(TrainingAdvisoryCommittee).Allprogrammaticaspects,includingFOAdevelopment,reviewattendance,payplandevelopment,anddecisionsaboutawardmanagement,havebeendonethroughthetrans-NIHgroupthatincludes13ICs,theCommonFund,andtheOfficeofBehavioralandSocialScienceResearch.Day-to-daymanagementoftheawardsisdistributedacross6ICsinanefforttobalanceincludingasmanyICsaspossibleandensuringthatgranteesaretreateduniformly.
January2016
Goal1:ToImproveBigDataSkillsofBiomedicalScientistsToimprovethebigdataskillsofallbiomedicalscientists,trainingopportunitiesneedtobeavailable,andbiomedicalscientistsneedtofindandaccesstheonesthatbestfittheirneeds.Tothisend,BD2Ksupportsthedevelopmentoftrainingopportunitiesandtheirdisseminationtolargenumbersoflearners,aswellasinfrastructurefordiscoveringthem.Tohelpbiomedicalscientistsfind,access,andchoosetrainingopportunities,theTrainingCoordinationCenteriscreatinganEducationalResourceDiscoveryIndex(ERuDIte).ERuDIteisplannedtobeadiscoveryindexthatorganizespointerstoeducationalcontent,utilizingmetadatadescribingtheeducationalresource.UtilizingandextendingcommonmetadataisbeingpursuedthroughaninternationalcollaborationbetweentheTCCandELIXIR,aEuropean-basedfederationoforganizationsthatbuildinfrastructureforthelifesciences.ERuDIte,whencombinedwithaknowledgemapthatshowshowBigDataskillsrelatetooneanother,mayformthebasisofapersonalizedlearningsystemforbiomedicalscientiststoefficientlyacquirenewskillstotackleBigData.CreationofthecontentthatERuDIteorganizesissupportedbyBD2KprimarilythroughR25sforopeneducationalresources(OER)andshortcoursesusingFOAsHG14-008,HG14-009,LM15-001,andLM15-002.Educationalcontentmayalsocomefromothersources,includingnon-BD2Kones.SomeoftheBD2KCenters,whicheachhaveatrainingcomponent,areproducingeducationalcontentsuchasTED-liketalks.CurriculumfordatasciencecoursesmaycomefromBD2KT32/T15trainingprograms,whichweregiventheopportunitytoapplyfor$20Kinfundstodevelopandsharecurriculumofnewcourses.ThenumberofeducationalresourcessupportedbyBD2K,orevenNIH,isdwarfedbythenumbersupportedelsewhere,whetherbytheNationalScienceFoundation,theDepartmentofEducation,foundations,universities,orprivateindustry.EducationalresourcesfromallofthesesourceswillbeincludedascontentinERuDIte.TheBD2KOpenEducationalResources,Shortcourses,andDiversityprogramsallaimtointroducebiomedicalscientiststodatamanagementanddatascience.Theseprogramsweredesignedtobeflexible,toallowforinnovationandforspecializationtoaparticularaudience,domainscience,ordatascience.Althoughafewofthefundedprogramsareconfinedtoanarrowaudienceorscientificarea,mostareforgeneralaudiencesandmultipledatatypes.InFY14andFY15,atotalof29R25programswerefunded,andtheseprogramshaveabroadreachgeographically.Moredetailabouttheaudience,scientificbreadth,andreachfollow.
January2016
AudienceTheR25awardsaddressavarietyofeducationallevels(undergradstoseniorfaculty)andintendedusages(endusersorinstructors).
• Fiveprogramsfocusonhelpinginstructorsofadvancedundergrad/earlygradcourses.Examplesinclude:
o Atrain-the-trainerscourseinbiomedicaldatascienceforinstructors,whowillcollectivelydevelopacurriculumforundergraduatesatnon-research-intensivecolleges
o Curricularmaterials(slides,assessments,readingsuggestions)thatcanbeusedandadaptedbyotherinstructors
o AToolkittohelplibrariansteachdatamanagementtobiomedicalscientists
• Sevenprogramstargetundergradsdirectly,throughsummerprograms(includingdidacticandresearchexperiences)thataimtorecruitdatasciencestudentsintobiomedicalscienceortoexposebiomedicalstudentstodatamanagementanddatascience.FourofthesevenprogramshaveaprimarygoalofenhancingdiversitythroughpartneringwithBD2KCenters.
• Theremaining16programsfocusdirectlyonthegraduatestudentorthemoreadvancedlearner;althoughthedatamanagementanddatasciencematerialisintroductory,becauseitisnew,learnersarejustaslikelytobeseniorfacultyasgraduatestudents.
TheR25programsareamaincomponentofBD2K’sdiversityefforts.FourundergraduateprogramsaimtoenhancediversityinthebiomedicalworkforcethroughpartnershipsbetweentheBD2KCentersofExcellenceandlow-resourcedinstitutions.Thepartnershipssupportthedevelopmentofcurriculumandresearchexperiencesforundergraduatesandfacultyfromlow-resourced
institutions.Collectively,thesefourprogramsreach134studentsoverthecourseof5years.However,thenumberofstudentstouchedbytheimprovedcurriculumandthestrengthenedfacultyisfargreater.Inadditiontothefourprogramsfundedexplicitlyfordiversity,otherR25smakeseriouseffortstorecruitandtrainunderrepresentedminorities.Forexample,inthefirstoffering,theshortcoursefromOregonHealthSciencesUniversitytrained9URMsoutofthe17totalparticipants.AlthoughtheR25programsformthecoreofBD2K’sdiversityefforts,URMscanbesupportedbyBD2KthroughtheT32/T15trainingprograms,whichmusthave“diversityrecruitmentandretentionplans,”andthroughdiversitysupplements(PA-15-322).
January2016
ScientificBreadth• DomainFocus:Althoughthemajorityoftheprogramsaimforageneral
biomedicalaudience,someofthemfocusonaparticulardatatype.Aboutaquarteroftheprogramsfocusongenomics,andanotherquarterfocusononeofimaging,clinical,orpopulationdata.Themajorityusemultipledatatypesandareforageneralbiomedicalaudience.
Areaswithfewapplicationsandhencefewfundedapplicationsaretheclinicalandpopulationsciences.Withintheseareas,thisisnotabledearthofactivityinmHealth.Toencourageapplicationsintheclinicalandpopulationsciences,BD2KwillworkcloselywithrelevantInstitutes,CentersandOffices,suchasNCATSandOBSSR,toensurethatanynewfundingopportunityannouncementscontainlanguagetoencourage
applicationsinthisareaandarewidelyadvertisedtotheappropriatecommunities.
• DataManagementandDataScience:Collectively,theawardsspanabroadrangeoftopics,includingdatamanagement,dataexploration,datarepresentation,computing,datamodeling,anddatavisualization.Eachofthesetopicscanbefurtherbrokendowninthefollowingway.
o Datamanagement:reuseofdata,datastandards,locatingandaccessingdataandtools,organizingandcuratingdatathroughontologiesanduseofmetadata
o Computing:distributedorparallelcomputing,workflows,programming,algorithms,optimization,andnaturallanguageprocessing
o Datarepresentation:datastructuresanddatabaseso Dataexploration:datamungingandpreparation,exploratorydata
analysiso DataModeling:probability,stochasticmodeling,introductory
statistics,advancedstatistics(e.g.multipletesting,dimensionreduction),machinelearning,experimentaldesign,Bayesianmethods,reproducibleresearch,networkmodels
o DatavisualizationandcommunicationAlthoughcollectivelytheR25awardsspantherangeabove,theyareoftencoveredwithlittledepth,andsomeofthetopicsareonlycoveredbyoneOpenEducationalResource,andthesetendtobethemoretechnical(e.g.optimizationandstochasticmodeling)orspecialized(e.g.ELSIandteamscience)topics.
January2016
Reach
• Elevenprogramswithin-personcomponentsarespreadevenlybetweenEastcoast,Westcoast,andthemiddleofthecountry.Basedontheplannedenrollments,the11programswillserveover250participantsinthesummerof2016.
o ThethreeprogramsthatwerefundedinFY14heldcoursesinthesummerof2015.Theycollectivelyreachedundergraduates,PhDstudents,andfacultyfromover30differentuniversities.
o Demandforthe2015in-personcourseswashigh,withreportedacceptanceratesfortheprogramswithlimitedslotsasbeing35%and14%.Anotherprogramgavesupporttoalimitednumberofstudentsbutopenedalargeauditoriumforthecourse.
• FourteenoftheawardsareonlineOpenEducationalResources.Thesereachalargenumberofstudentsandinstructors,providingagreatvalueperstudent.
o Forexample,about5,000studentscompletedthefirst8coursesinRafaIrizarry’sseriesofbiomedicalDataScienceMOOCsinthefirstoffering,amountingtoabout$40perstudentbasedonanNIHinvestmentof$200K(year1directcosts).
AlthoughBD2Kissupportingthedevelopmentanddiscoveryoftrainingresources,continuedsupportintheareaisneededforanumberofreasons:gapsincontentcoverageexist(e.g.methodsformHealthdata,algorithmsandoptimizationmethods,advancedstatistics,networkmodels);demandforin-personcoursescontinuestoexceedsupply;differentwaysofexplainingmaterialresonatewithdifferentlearners;andmaterialsneedtobeupdatedtotakeintoaccountnewscienceandnewdevelopmentsintheunderstandingoflearning.
Goal2:ToIncreasetheNumberofBiomedicalDataScientistsToincreasethenumberofbiomedicaldatascientists,traineesneedtogaintheappropriateskills,wanttoworkinbiomedicalscience,andhaveanappropriateplacetowork.Becauseallofthesetraineeshaveself-selectedtowardbiomedicalscience,BD2K’sfocusisonhelpingtraineesgettheappropriateskillsinitiallyandusethoseskillsinthelongrun.Traineesmaybe1)predoctoralstudents,whogainfoundationsinbiomedicaldatasciencethroughT32/T15trainingprograms,2)postdocs/faculty,whoaretrainedineitherbiomedicalscienceordatascienceandrecognizetheneedtocomplementtheirexistingknowledgeandskillsthroughK01careerdevelopmentawards,or3)studentsorfacultywhoneedspecializedtrainingthatisunavailablelocallybutattainablethroughaResearchRotation.Retainingtraineesisbothimportantandachallenge,duetothedemandfordatascienceskillsacrosssectors.Althoughretentionisbeingaddressedprimarily
January2016
throughprovidingsupport,BD2Kisalsointerestedinfacilitatingconversationswithuniversityleaderstodiscusscareerpathsforopenscienceanddatascience.SupportforthefurtherdevelopmentofdatascientistsintobiomedicaldatascientistsisgiventhroughK01awards.Beforeblendingbiomedicalanddatascienceknowledge,datascientistsmaybeabletoimmediatelycontributetoaddressingbiomedicalproblemsthroughworkinginteamswithbiomedicalscientists.SuchteamsarebeingfosteredthroughjointNSF/BD2KsupportintheQuBBDprogram,forbuildingcollaborationsbetweendatascientistsandbiomedicalscientists.
T32/T15TrainingProgramsPredoctoraltraineesaresupportedthroughnewT32programsandsupplementstoexistingT32orNLMT15programs:6werefundedinFY14,andanother10willbefundedinFY15,foratotalof84trainees.Sincemanyofthedepartmentsthattheprogramsresideinarenew,notallprogramsrequestedthemaximumof6slots.Becausethefieldisnascent,thereisstillnotfullagreementastowhatthecorecompetenciesare.However,someofthecommonareasofmostBD2Ktrainingprogramsare:modernstatistics(e.g.handlingmultiplecomparisons,highdimensionaldataanalysis,spatial-temporalcorrelation),computationaltechniques(e.g.usingcloudandparallelcomputing,optimization,andalgorithms),andmachinelearning.Mostprogramsarecreatingnewcoursesinordertointegratethesetopicstogetherandfocusclasstimeontheonesthatneedtobelearneddidactically.Belowisawordcloudofthecorecoursesoftheprograms.
January2016
K01CareerDevelopmentBD2Kissupporting21postdocsandfacultywithmentoredcareerdevelopmentawards(K01).ThePIscomefromdiversebackgrounds:
• 9arephysicians,withspecialtiesinhematology/oncology,neurology,neuroradiology,surgery,urologicsurgery,pulmonaryandcriticalcaremedicine,andinternalmedicine
• 7haveprimarilyquantitativeorcomputationalbackgrounds,withdegreesinElectricalEngineeringandComputerScience,Physics,NuclearPhysics,andBiomedicalEngineering
• 3havebackgroundsinfieldsthatblendthebiomedicalandcomputationalsciences(moleculargenetics,bioinformaticsandcomputationalbiochemistry)
• 2arebehavioralorsocialscientists(SocialEpidemiology,QuantitativePsychology)
ThegroupofK01awardeesisdiversenotjustbyscientificbackgroundbutalsodemographicallyandgeographically:
• Nineoutof21arefemale.• Theyworkat18uniqueinstitutions.
Throughasustainedperiodofresearchcareerdevelopmentandtraining,theK01awardeeswillgaintheknowledgeandskillsnecessarytolaunchindependentresearchcareers.Thegoalisthattheybecomecompetitivefornewresearchprojectgrant(e.g.,R01)fundingintheareaofBigDataScience.
ResearchRotationsTheBD2KTrainingCoordinationCenterhasthreemaingoals:1)tocoordinatebyincreasingcommunicationacrosstheBD2K-fundedtrainingawards;2)todevelopadiscoveryindexforeducationalresources;and3)tocoordinateresearchrotationstofacilitateaccesstoappropriateexpertise.The“researchrotations”aimtomatchtraineeswhoneedspecializedtrainingwiththoseexpertswillingandabletoprovideit.Thetraineesareexpectedtomainlybegraduatestudents,postdocs,orjuniorfaculty.Implementationdetailsoftheresearchrotationsarestillinthedevelopmentstage,andevaluationmeasuresarebeingdesignedalongwiththeprogram.
January2016
NSF/NIHQuantitativeBiomedicalBigData(QuBBD)ProgramSomeproblemswillrequireanewmodelofleadership.Particularlywhenverydiverseskillsneedtobebroughttobearontheproblem,teamsofindividualswithcomplementaryexpertisewillbeneeded.SuchteamsarebeingfosteredthroughapartnershipbetweenNSFandNIHcalledtheQuBBDprogram.ThisprogramconsistsofaseriesofInnovationLabsandthefundingofplanninggrants.InnovationLabsareweek-longmentoredworkshopsthatcatalyzeinterdisciplinaryteamsandspeeduptheprocessofdevelopingtheteam’sresearchprogram.BD2KranapilotInnovationLabinJuly2015.Bytheendoftheweek,12newinterdisciplinaryteamswerepreparedtosubmitgrantapplicationstogether.Theteamssubmittedapplicationsforsmallplanninggrants,alongwithnewly-formedteamsthatdidnotgothroughtheInnovationlab.Becauseanestablishedteamcandomuchworkvirtually,theteamsfosteredbytheQuBBDprogramdrawinawiderangeoftalent,manyofwhomarefromschoolsnototherwiserepresentedwithinBD2K.Theseteamsincludesomedatascientistswhoareinphysicallyisolatedlocationsalongwithbiomedicalscientistswhoareunabletofinddatasciencecollaboratorsduetothehighdemand.TheSummer2015InnovationLabwassuccessful,andthekeytosuccesswastheparticipationofmentorswhoarefromavarietyofbackgroundsandareleadersintheirrespectivefields.Thementorsguideteamsthroughtheiterativeideationprocess,offeringfeedbackonresearchideasthroughouttheweek.ContinuationoftheQuBBDprogramisunderdiscussion.
SummaryTheBD2KprogramsintrainingdescribedinthisdocumentareearlyeffortstoaddressaneedidentifiedbytheAdvisoryCommitteetotheDirector’sDataandInformaticsWorkingGroup.Theycoverbothbiomedicaldatasciencespecialists,aswellasspecialistsinotherbiomedicalareas.Theyalsocovertheeducationalpipelinefromundergraduatestofaculty.Theseprogramsare“earlyefforts”becausethereisstillmuchworktobedonetomeetbothgoals.Toquicklyincreasethebasedatascienceskillsofawidevarietyofbiomedicalscientists,earlyresourcesfocusedonbroad,generallyapplicabletopics.Laterresourcesmightbeonmorespecializedtopics.Likewise,toquicklyjumpstarttheincreaseinthenumberofbiomedicaldatascientists,someofthetrainingprogramsaremodificationsofexistingprograms,buildingonexistinginfrastructureandcourses.Asthecommunityconvergesonthecorecompetenciesofthefield,biomedicaldatasciencetrainingprogramswilllikelyevolveandmayendupbearinglittleresemblancetotheprogramsinitiallyfunded.
January2016
TheBD2KInitiativeisinitsinfancy,andthetrainingprograms,alongwithotherBD2Kprograms,arecontributingtothedevelopmentofthefieldofbiomedicaldatascienceintheUSandacrosstheworld.AlthoughtheawardedgrantsareconfinedtoUSinstitutions,thereachextendsfarbeyondthroughthedevelopmentofopeneducationalresourcesandcontributionstotheglobalconversationsaboutthefieldofbiomedicaldatascience.Inaddition,internationalcollaborationssurroundingthediscoveryofopeneducationalresourcesforbiomedicaldatasciencehavebegun.TheBD2KprogramaimstoimprovetheabilityofthebiomedicalworkforcetouseBigDatabothtodayandtomorrow,bothintheUSandacrosstheworld.
GeographicDistributionofBD2KTrainingAwards