andrew treloar - the life‐sciences as a pathfinder in data‐intensive research practice abstract:...

39
The lifesciences as a pathfinder in data intensive research prac3ce Dr Andrew Treloar, Director of Technology July 10, 2014 CCBYSA, @atreloar 1

Upload: australian-bioinformatics-network

Post on 10-May-2015

466 views

Category:

Science


6 download

DESCRIPTION

The advent of the Internet is bringing about fundamental changes in the ways that research is performed and communicated. These have been particularly driven by the growing importance of data, as well as the tools available to work with this data. This presentation will examine this shift, drawing on examples from the life‐sciences, and try to make some predictions about the next five years. First presented at the 2014 Winter School in Mathematical and Computational Biology http://bioinformatics.org.au/ws14/program/

TRANSCRIPT

Page 1: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The

The  life-­‐sciences  as  a  pathfinder  in  data-­‐intensive  research  prac3ce  Dr  Andrew  Treloar,  Director  of  Technology  

July  10,  2014   CC-­‐BY-­‐SA,  @atreloar   1  

Page 2: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The

Structure  presenta3on  §  Research  Lifecycles  §  Func3ons  of  Scholarly  Communica3on  §  Pointers  to  the  future  §  Characterising  the  future  §  Pathfinder  problems  §  Conclusions  

July  10,  2014   CC-­‐BY-­‐SA,  @atreloar   2  

Page 3: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The

So  many  lifecycles…  

July  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   3  

Page 4: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The

Minimal  Research  Lifecycle  

Think

Do Share

July  10,  2014   CC-­‐BY-­‐SA,  @atreloar   4  

Page 5: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The

Sharing: Scholarly Communication System and its Functions §  Registration §  Certification §  Awareness §  Archiving

(Rosendaal and Geurts, 1997)

July  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   5  

Page 6: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The

System of Journals §  Registration

§  submission of manuscript §  Certification

§  peer-review (pre-publication) §  commentary (post-publication)

§  Awareness §  discovery services

§  Archiving §  libraries (print) §  publishers (electronic) §  special purpose organisations (e.g. Portico)

July  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   6  

Page 7: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The

Pointers to the future

“the future is already here – it’s just not very evenly distributed”

William Gibson, NPR interview

July  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   7  

Page 8: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The

Registration: BioRxiv

July  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   8  

Page 9: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The

Registration: Github

July  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   9  

Page 10: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The

Registration: WikiPathways

July  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   10  

Page 11: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The

Registration: NeuroLex

July  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   11  

Page 12: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The

Registration: Nanopublications

July  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   12  

Page 13: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The

Registra3on:  some  observa3ons  §  Decoupling  registra3on  from  cer3fica3on    §  Timestamping,  versioning  §  Registra3on  of  various  types  of  objects  §  Machines  as  creators  and  contributors  

July  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   13  

Page 14: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The

Certification: PubMed Commons

July  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   14  

Page 15: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The

Certification: PubPeer

July  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   15  

Page 16: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The

Cer3fica3on:  Publons  

July  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   16  

Page 17: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The

Cer3fica3on:  some  observa3ons  §  Peer-­‐review  decoupled  from  publica3on  process  §  Cer3fica3on  of  various  types  of  objects  §  Machines  valida3ng  form  §  Social  endorsement  

July  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   17  

Page 18: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The

Awareness: myExperiment

July  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   18  

Page 19: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The

Awareness: eLabNotebook RSS

July  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   19  

Page 20: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The

Awareness: Twitter

July  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   20  

Page 21: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The

Awareness: some observations §  Awareness  for  various  types  of  objects  §  Real  3me  awareness  §  Awareness  support  targeted  at  machines  §  Awareness  through  social  media  

July  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   21  

Page 22: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The

Archiving: PDB

July  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   22  

Page 23: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The

Archiving: GenBank

July  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   23  

Page 24: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The

Characterising the future

Fixed Varying

Discrete Continuous

Hidden VisibleResearch Process

Nature of object

Process of making public

Speed of communicationDelayed Instant

Atomic CompoundAtomicity of object

Communicated objectPublication

+data proxies

Publication + linked data + linked models

Formal InformalNature of processJuly  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   24  

Page 25: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The

Fundamental changes §  The research process (objects, social

dimension) is becoming more exposed §  Articles, books are no longer the only

relevant objects for research communication

§  Objects are no longer static §  Machines are joining humans as

(co-)creators and consumers of research objects

July  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   25  

Page 26: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The

Pathfinder  problems  §  Integrity  of  the  scholarly  record  §  The  three  obsolescences  

§  hardware  §  file  format  §  soWware  

July  10,  2014   CC-­‐BY-­‐SA,  @atreloar   26  

Page 27: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The

System of Journals: Archiving

July  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   27  

Page 28: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The

Web of Objects: Archiving?

July  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   28  

Page 29: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The

Not just citation relationships

July  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   29  

Page 30: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The

The  problem  of  obsolescence  §  Lifescience  research  environment  can  be  viewed  as  undergoing  a  process  of  accelerated  evolu3on  

§  Other  disciplines  will  hit  these  problems  in  3me    

July  10,  2014   CC-­‐BY-­‐SA,  @atreloar   30  

Page 31: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The

Cambrian  explosion  

July  10,  2014   31  

Page 32: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The

Hardware  obsolescence:  Roche  454  

July  10,  2014   CC-­‐BY-­‐SA,  @atreloar   32  

Page 33: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The

SoWware  obsolescence:  too  much  choice,  not  enough  support  

July  10,  2014   CC-­‐BY-­‐SA,  @atreloar   33  

Page 34: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The

Abandonware  §  “Last  summer,  a  member  of  the  biology  department  of  the  

University  of  Udine  in  Italy  approached  Nicola  Vitacolonna  with  an  intriguing  project.  The  ANREP  program,  which  annotates  structural  mo3fs  in  gene  or  protein  sequences,  was  out  of  date  having  been  wriben  more  than  a  decade  ago.  Although  s3ll  used  by  molecular  biologists,  its  slow  compu3ng  ability  meant  a  straighcorward  mul3ple  search  could  take  all  night  on  a  desktop  PC.  The  Udine  biologist  wanted  Vitacolonna,  a  postdoctoral  fellow  in  computa3onal  biology,  to  write  a  program  that  could  do  the  job  more  quickly.”  §  Sam  Jaffe,  Scien3sts  Abandon  their  SoWware,  The  Scien)st,  Feb  16,  2004  

July  10,  2014   CC-­‐BY-­‐SA,  @atreloar   34  

Page 35: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The

File  format  obsolescence:  Illumina  §  Probability  of  error  in  basecalling  encoded  using  ascii  code  to  reduce  file  size  

§  Meaning  of  the  ascii  code  changed  along  the  life  cycle  and  for  data  generated  at  different  3me  points  the  quality  might  be  encoded  differently  

§  “If  you  get  an  error  like  "Invalid  quality  score  value",  your  fastq  file  probably  has  Sanger  (offset  33)  instead  of  Illumina  (ASCII  offset  64)  quality  scores.  You'll  need  to  add  the  op3on  "-­‐Q33"  to  your  FASTX  Toolkit  arguments”.  Obviously…  

July  10,  2014   CC-­‐BY-­‐SA,  @atreloar   35  

Page 36: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The

Evereb  Rogers,  Diffusion  of  Innova)on,  1962  

July  10,  2014   CC-­‐BY-­‐SA,  @atreloar   36  

Page 37: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The

Conclusions  §  Need  to  move  to  a  smaller  number  of  standard  file  formats  

§  Need  to  move  to  a  more  sustainable  model  of  soWware  development  and  maintenance  

§  Need  to  encourage  placorm  manufacturers  to  innovate  around  the  hardware,  not  the  soWware  

§  NOTE:  other  disciplines  are  looking  to  lifesciences  to  work  out  how  to  solve  some  of  these  problems  

July  10,  2014   CC-­‐BY-­‐SA,  @atreloar   37  

Page 38: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The

On  best  prac3ces  in  the  development  of  bioinforma3cs  soWware,  Front.  Genet.,  02  Jul  14  

§  Source  code  available  to  reviewers  §  SoWware  indexed,  citable,  available  §  Source  code  documented  §  Source  code  managed  §  Test  libraries,  sample  data  and  dataset  repositories  available  

July  10,  2014   CC-­‐BY-­‐SA,  @atreloar   38  

Page 39: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The

Ques3ons?  §  [email protected]  

§  @atreloar    §  hbps://www.slideshare.net/atreloar/the-­‐lifesciences-­‐as-­‐a-­‐pathfinder-­‐in-­‐dataintensive-­‐research-­‐prac3ce  

July  10,  2014   CC-­‐BY-­‐SA,  @atreloar   39