vivien bonazzi ph.d. program director: computational biology (nhgri) co chair software methods &...

27
Vivien Bonazzi Ph.D. Program Director: Computational Biology (NHGRI) Co Chair Software Methods & Systems (BD2K) Biomedical Big Data Initiative (BD2K)

Upload: egbert-cameron

Post on 23-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

  • Slide 1
  • Vivien Bonazzi Ph.D. Program Director: Computational Biology (NHGRI) Co Chair Software Methods & Systems (BD2K) Biomedical Big Data Initiative (BD2K)
  • Slide 2
  • Myriad Data Types Other Omic ImagingPhenotypic Clinical Genomic Exposure
  • Slide 3
  • Data and Informatics Working Group acd.od.nih.gov/diwg.htm
  • Slide 4
  • What Are the Big Problems to Solve? 1. Locating the data 2. Getting access to the data 3. Extending policies and practices for data sharing 4. Organizing, managing, and processing biomedical Big Data 5. Developing new methods for analyzing biomedical Big Data 6. Training researchers who can use biomedical Big Data effectively
  • Slide 5
  • Overarching Strategy and Goals Two initiatives being proposed to overcome roadblocks Big Data to Knowledge (BD2K) enable the biomedical research enterprise to maximize the value of biomedical data InfrastructurePlus create an adaptive environment at NIH to sustain world-class biomedical research
  • Slide 6
  • Big Data to Knowledge (BD2K): Overview Major trans-NIH initiative addressing an NIH imperative and key roadblock Aims to be catalytic and synergistic Overarching goal: By the end of this decade, enable a quantum leap in the ability of the biomedical research enterprise to maximize the value of the growing volume and complexity of biomedical data
  • Slide 7
  • I.Facilitating Broad Use of Biomedical Big Data II. Developing and Disseminating Analysis Methods and Software for Biomedical Big Data III. Enhancing Training for Biomedical Big Data IV. Establishing Centers of Excellence for Biomedical Big Data BD2K: Four Programmatic Areas
  • Slide 8
  • Area 1: Data Sharing & Access A. Policies to Facilitate Data Sharing. B. Data Catalog: Data Discovery, Citation, Links to Literature. C. Frameworks for Community-Based Solutions to Developing Data Standards. D. Enabling Research Use of Clinical Data. Facilitating usage and sharing of biomedical big data New Policies to Encourage Data & Software Sharing Index of Research Datasets to Facilitate Data Location & Citation Community-based Development of Data & Metadata Standards
  • Slide 9
  • Area 2: Software and Systems Development A. Grants for software development B. Software Registry: Making biomedical software findable and citable C. Cloud computing: Facilitating Data Analysis D. Dynamic Social Engagement via social media Development of analysis methods and software Software to Meet Needs of the Biomedical Research Community Facilitating Data Analysis: Access to Large-scale Computing Dynamic Community Engagement of Users and Developers
  • Slide 10
  • Software Grants Current and emerging needs for using, managing, and analyzing the larger and more complex data sets inherent to biomedical Big Data Compression/Reduction Visualization Provenance Data Wrangling Area 2: Software and Systems Development
  • Slide 11
  • Big Data needs Big Computing Cloud Computing Leveraging the cloud Storing and analyzing huge data sets Collaborative environment Developing appropriate policies for use of controlled access data in the cloud (dbGaP) Developing working relationships with major cloud providers AWS, Google, Microsoft (Azure) HPC More exploration with Supercomputing facilities Area 2: Software and Systems Development
  • Slide 12
  • Area 3: Training Enhancing computational training Increase Number of Computationally Skilled Trainees Strengthen the Quantitative Skills of All Researchers Enhance NIH Review and Program Oversight
  • Slide 13
  • Area 4: Centers A. Investigator-initiated Centers B. NIH-specified Centers Establishing centers of excellence Collaborative environments & technologies Data integration Analysis & modeling methods Computer science & statistical approaches
  • Slide 14
  • Big Data to Knowledge (BD2K) bd2k.nih.gov
  • Slide 15
  • Biomedical Research as Part of the Digital Enterprise Philip E. Bourne Ph.D. Associate Director for Data Science National Institutes of Health
  • Slide 16
  • Myriad Data Types Other Omic ImagingPhenotypic Clinical Genomic Exposure
  • Slide 17
  • Myriad Data Types Other Omic ImagingPhenotypic Clinical Genomic Exposure
  • Slide 18
  • Components of The Academic Digital Enterprise Consists of digital assets E.g. datasets, papers, software, lab notes Each asset is uniquely identified and has provenance, including access control E.g. publishing simply involves changing the access control Digital assets are interoperable across the enterprise
  • Slide 19
  • Lets Break Down the Silos New policies, regulations e.g. data sharing Economic drivers The promise of shared data
  • Slide 20
  • The NIH is Starting to Think About the Digital Enterprise Big Data to Knowledge (BD2K) bd2k.nih.gov
  • Slide 21
  • This is great, but BD2K is just a start, what will the end product look like?
  • Slide 22
  • To get to that end point we have to consider the complete research lifecycle
  • Slide 23
  • The Research Life Cycle will Persist IDEAS HYPOTHESES EXPERIMENTS DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
  • Slide 24
  • Tools and Resources Will Continue To Be Developed IDEAS HYPOTHESES EXPERIMENTS DATA - ANALYSIS - COMPREHENSION - DISSEMINATION Authoring Tools Lab Notebooks Data Capture Software Analysis Tools Visualization Scholarly Communication
  • Slide 25
  • Those Elements of the Research Life Cycle will Become More Interconnected Around a Common Framework IDEAS HYPOTHESES EXPERIMENTS DATA - ANALYSIS - COMPREHENSION - DISSEMINATION Authoring Tools Lab Notebooks Data Capture Software Analysis Tools Visualization Scholarly Communication
  • Slide 26
  • New/Extended Support Structures Will Emerge IDEAS HYPOTHESES EXPERIMENTS DATA - ANALYSIS - COMPREHENSION - DISSEMINATION Authoring Tools Lab Notebooks Data Capture Software Analysis Tools Visualization Scholarly Communication Commercial & Public Tools Git-like Resources By Discipline Data Journals Discipline- Based Metadata Standards Community Portals Institutional Repositories New Reward Systems Commercial Repositories Training
  • Slide 27
  • Thank You Questions? [email protected]