the life-sciences as a pathfinder in data-intensive research practice

39
The life-sciences as a pathfinder in data-intensive research practice Dr Andrew Treloar, Director of Technology 20/06/22 CC-BY-SA, @atreloar 1

Upload: andrew-treloar

Post on 10-May-2015

284 views

Category:

Science


3 download

DESCRIPTION

Presentation given at UQ Winterschool 2014. The advent of the Internet is bringing about fundamental changes in the ways that research is performed and communicated. These have been particularly driven by the growing importance of data, as well as the tools available to work with this data. This presentation will examine this shift, drawing on examples from the life‐sciences, and try to make some predictions about the next five years.

TRANSCRIPT

Page 1: The life-sciences as a pathfinder in data-intensive research practice

The life-sciences as a pathfinder in data-intensive research practice

Dr Andrew Treloar Director of Technology

11 April 2023 CC-BY-SA atreloar 1

Structure presentation Research Lifecycles Functions of Scholarly Communication Pointers to the future Characterising the future Pathfinder problems Conclusions

11 April 2023 CC-BY-SA atreloar 2

So many lifecycleshellip

11 April 2023 CC-BY-SA hvdsomp and atreloar 3

Minimal Research Lifecycle

Think

DoShare

11 April 2023 CC-BY-SA atreloar 4

Sharing Scholarly Communication System and its Functions

Registration Certification Awareness Archiving

(Rosendaal and Geurts 1997)

11 April 2023 CC-BY-SA hvdsomp and atreloar 5

System of Journals Registration

submission of manuscript

Certification peer-review (pre-publication) commentary (post-publication)

Awareness discovery services

Archiving libraries (print) publishers (electronic) special purpose organisations (eg Portico)

11 April 2023 CC-BY-SA hvdsomp and atreloar 6

Pointers to the future

ldquothe future is already here ndash itrsquos just not very evenly distributedrdquo

William Gibson NPR interview

11 April 2023 CC-BY-SA hvdsomp and atreloar 7

Registration BioRxiv

11 April 2023 CC-BY-SA hvdsomp and atreloar 8

Registration Github

11 April 2023 CC-BY-SA hvdsomp and atreloar 9

Registration WikiPathways

11 April 2023 CC-BY-SA hvdsomp and atreloar 10

Registration NeuroLex

11 April 2023 CC-BY-SA hvdsomp and atreloar 11

Registration Nanopublications

11 April 2023 CC-BY-SA hvdsomp and atreloar 12

Registration some observations Decoupling registration from certification Timestamping versioning Registration of various types of objects Machines as creators and contributors

11 April 2023 CC-BY-SA hvdsomp and atreloar 13

Certification PubMed Commons

11 April 2023 CC-BY-SA hvdsomp and atreloar 14

Certification PubPeer

11 April 2023 CC-BY-SA hvdsomp and atreloar 15

Certification Publons

11 April 2023 CC-BY-SA hvdsomp and atreloar 16

Certification some observations Peer-review decoupled from publication process Certification of various types of objects Machines validating form Social endorsement

11 April 2023 CC-BY-SA hvdsomp and atreloar 17

Awareness myExperiment

11 April 2023 CC-BY-SA hvdsomp and atreloar 18

Awareness eLabNotebook RSS

11 April 2023 CC-BY-SA hvdsomp and atreloar 19

Awareness Twitter

11 April 2023 CC-BY-SA hvdsomp and atreloar 20

Awareness some observations Awareness for various types of objects Real time awareness Awareness support targeted at machines Awareness through social media

11 April 2023 CC-BY-SA hvdsomp and atreloar 21

Archiving PDB

11 April 2023 CC-BY-SA hvdsomp and atreloar 22

Archiving GenBank

11 April 2023 CC-BY-SA hvdsomp and atreloar 23

Characterising the future

11 April 2023 CC-BY-SA hvdsomp and atreloar 24

Fundamental changes The research process (objects social

dimension) is becoming more exposed Articles books are no longer the only

relevant objects for research communication Objects are no longer static Machines are joining humans as

(co-)creators and consumers of research objects

11 April 2023 CC-BY-SA hvdsomp and atreloar 25

Pathfinder problems Integrity of the scholarly record The three obsolescences

hardware file format software

11 April 2023 CC-BY-SA atreloar 26

System of Journals Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 27

Web of Objects Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 28

Not just citation relationships

11 April 2023 CC-BY-SA hvdsomp and atreloar 29

Your Text Here

The problem of obsolescence Lifescience research environment can be viewed as

undergoing a process of accelerated evolution Other disciplines will hit these problems in time

11 April 2023 CC-BY-SA atreloar 30

Cambrian explosion

11 April 2023 31

Hardware obsolescence Roche 454

11 April 2023 CC-BY-SA atreloar 32

Software obsolescence too much choice not enough support

11 April 2023 CC-BY-SA atreloar 33

Abandonware ldquoLast summer a member of the biology department of the

University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004

11 April 2023 CC-BY-SA atreloar 34

File format obsolescence Illumina Probability of error in basecalling encoded using ascii code

to reduce file size Meaning of the ascii code changed along the life cycle and

for data generated at different time points the quality might be encoded differently

ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip

11 April 2023 CC-BY-SA atreloar 35

Everett Rogers Diffusion of Innovation 1962

11 April 2023 CC-BY-SA atreloar 36

Conclusions Need to move to a smaller number of standard file

formats Need to move to a more sustainable model of

software development and maintenance Need to encourage platform manufacturers to

innovate around the hardware not the software NOTE other disciplines are looking to lifesciences

to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37

On best practices in the development of bioinformatics software Front Genet 02 Jul 14

Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories

available

11 April 2023 CC-BY-SA atreloar 38

Questions andrewtreloarandsorgau

atreloar

httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice

11 April 2023 CC-BY-SA atreloar 39

  • The life-sciences as a pathfinder in data-intensive research pr
  • Structure presentation
  • So many lifecycleshellip
  • Minimal Research Lifecycle
  • Sharing Scholarly Communication System and its Functions
  • System of Journals
  • Pointers to the future
  • Registration BioRxiv
  • Registration Github
  • Registration WikiPathways
  • Registration NeuroLex
  • Registration Nanopublications
  • Registration some observations
  • Certification PubMed Commons
  • Certification PubPeer
  • Certification Publons
  • Certification some observations
  • Awareness myExperiment
  • Awareness eLabNotebook RSS
  • Awareness Twitter
  • Awareness some observations
  • Archiving PDB
  • Archiving GenBank
  • Characterising the future
  • Fundamental changes
  • Pathfinder problems
  • System of Journals Archiving
  • Web of Objects Archiving
  • Not just citation relationships
  • The problem of obsolescence
  • Cambrian explosion
  • Hardware obsolescence Roche 454
  • Software obsolescence too much choice not enough support
  • Abandonware
  • File format obsolescence Illumina
  • Everett Rogers Diffusion of Innovation 1962
  • Conclusions
  • On best practices in the development of bioinformatics software
  • Questions
Page 2: The life-sciences as a pathfinder in data-intensive research practice

Structure presentation Research Lifecycles Functions of Scholarly Communication Pointers to the future Characterising the future Pathfinder problems Conclusions

11 April 2023 CC-BY-SA atreloar 2

So many lifecycleshellip

11 April 2023 CC-BY-SA hvdsomp and atreloar 3

Minimal Research Lifecycle

Think

DoShare

11 April 2023 CC-BY-SA atreloar 4

Sharing Scholarly Communication System and its Functions

Registration Certification Awareness Archiving

(Rosendaal and Geurts 1997)

11 April 2023 CC-BY-SA hvdsomp and atreloar 5

System of Journals Registration

submission of manuscript

Certification peer-review (pre-publication) commentary (post-publication)

Awareness discovery services

Archiving libraries (print) publishers (electronic) special purpose organisations (eg Portico)

11 April 2023 CC-BY-SA hvdsomp and atreloar 6

Pointers to the future

ldquothe future is already here ndash itrsquos just not very evenly distributedrdquo

William Gibson NPR interview

11 April 2023 CC-BY-SA hvdsomp and atreloar 7

Registration BioRxiv

11 April 2023 CC-BY-SA hvdsomp and atreloar 8

Registration Github

11 April 2023 CC-BY-SA hvdsomp and atreloar 9

Registration WikiPathways

11 April 2023 CC-BY-SA hvdsomp and atreloar 10

Registration NeuroLex

11 April 2023 CC-BY-SA hvdsomp and atreloar 11

Registration Nanopublications

11 April 2023 CC-BY-SA hvdsomp and atreloar 12

Registration some observations Decoupling registration from certification Timestamping versioning Registration of various types of objects Machines as creators and contributors

11 April 2023 CC-BY-SA hvdsomp and atreloar 13

Certification PubMed Commons

11 April 2023 CC-BY-SA hvdsomp and atreloar 14

Certification PubPeer

11 April 2023 CC-BY-SA hvdsomp and atreloar 15

Certification Publons

11 April 2023 CC-BY-SA hvdsomp and atreloar 16

Certification some observations Peer-review decoupled from publication process Certification of various types of objects Machines validating form Social endorsement

11 April 2023 CC-BY-SA hvdsomp and atreloar 17

Awareness myExperiment

11 April 2023 CC-BY-SA hvdsomp and atreloar 18

Awareness eLabNotebook RSS

11 April 2023 CC-BY-SA hvdsomp and atreloar 19

Awareness Twitter

11 April 2023 CC-BY-SA hvdsomp and atreloar 20

Awareness some observations Awareness for various types of objects Real time awareness Awareness support targeted at machines Awareness through social media

11 April 2023 CC-BY-SA hvdsomp and atreloar 21

Archiving PDB

11 April 2023 CC-BY-SA hvdsomp and atreloar 22

Archiving GenBank

11 April 2023 CC-BY-SA hvdsomp and atreloar 23

Characterising the future

11 April 2023 CC-BY-SA hvdsomp and atreloar 24

Fundamental changes The research process (objects social

dimension) is becoming more exposed Articles books are no longer the only

relevant objects for research communication Objects are no longer static Machines are joining humans as

(co-)creators and consumers of research objects

11 April 2023 CC-BY-SA hvdsomp and atreloar 25

Pathfinder problems Integrity of the scholarly record The three obsolescences

hardware file format software

11 April 2023 CC-BY-SA atreloar 26

System of Journals Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 27

Web of Objects Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 28

Not just citation relationships

11 April 2023 CC-BY-SA hvdsomp and atreloar 29

Your Text Here

The problem of obsolescence Lifescience research environment can be viewed as

undergoing a process of accelerated evolution Other disciplines will hit these problems in time

11 April 2023 CC-BY-SA atreloar 30

Cambrian explosion

11 April 2023 31

Hardware obsolescence Roche 454

11 April 2023 CC-BY-SA atreloar 32

Software obsolescence too much choice not enough support

11 April 2023 CC-BY-SA atreloar 33

Abandonware ldquoLast summer a member of the biology department of the

University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004

11 April 2023 CC-BY-SA atreloar 34

File format obsolescence Illumina Probability of error in basecalling encoded using ascii code

to reduce file size Meaning of the ascii code changed along the life cycle and

for data generated at different time points the quality might be encoded differently

ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip

11 April 2023 CC-BY-SA atreloar 35

Everett Rogers Diffusion of Innovation 1962

11 April 2023 CC-BY-SA atreloar 36

Conclusions Need to move to a smaller number of standard file

formats Need to move to a more sustainable model of

software development and maintenance Need to encourage platform manufacturers to

innovate around the hardware not the software NOTE other disciplines are looking to lifesciences

to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37

On best practices in the development of bioinformatics software Front Genet 02 Jul 14

Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories

available

11 April 2023 CC-BY-SA atreloar 38

Questions andrewtreloarandsorgau

atreloar

httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice

11 April 2023 CC-BY-SA atreloar 39

  • The life-sciences as a pathfinder in data-intensive research pr
  • Structure presentation
  • So many lifecycleshellip
  • Minimal Research Lifecycle
  • Sharing Scholarly Communication System and its Functions
  • System of Journals
  • Pointers to the future
  • Registration BioRxiv
  • Registration Github
  • Registration WikiPathways
  • Registration NeuroLex
  • Registration Nanopublications
  • Registration some observations
  • Certification PubMed Commons
  • Certification PubPeer
  • Certification Publons
  • Certification some observations
  • Awareness myExperiment
  • Awareness eLabNotebook RSS
  • Awareness Twitter
  • Awareness some observations
  • Archiving PDB
  • Archiving GenBank
  • Characterising the future
  • Fundamental changes
  • Pathfinder problems
  • System of Journals Archiving
  • Web of Objects Archiving
  • Not just citation relationships
  • The problem of obsolescence
  • Cambrian explosion
  • Hardware obsolescence Roche 454
  • Software obsolescence too much choice not enough support
  • Abandonware
  • File format obsolescence Illumina
  • Everett Rogers Diffusion of Innovation 1962
  • Conclusions
  • On best practices in the development of bioinformatics software
  • Questions
Page 3: The life-sciences as a pathfinder in data-intensive research practice

So many lifecycleshellip

11 April 2023 CC-BY-SA hvdsomp and atreloar 3

Minimal Research Lifecycle

Think

DoShare

11 April 2023 CC-BY-SA atreloar 4

Sharing Scholarly Communication System and its Functions

Registration Certification Awareness Archiving

(Rosendaal and Geurts 1997)

11 April 2023 CC-BY-SA hvdsomp and atreloar 5

System of Journals Registration

submission of manuscript

Certification peer-review (pre-publication) commentary (post-publication)

Awareness discovery services

Archiving libraries (print) publishers (electronic) special purpose organisations (eg Portico)

11 April 2023 CC-BY-SA hvdsomp and atreloar 6

Pointers to the future

ldquothe future is already here ndash itrsquos just not very evenly distributedrdquo

William Gibson NPR interview

11 April 2023 CC-BY-SA hvdsomp and atreloar 7

Registration BioRxiv

11 April 2023 CC-BY-SA hvdsomp and atreloar 8

Registration Github

11 April 2023 CC-BY-SA hvdsomp and atreloar 9

Registration WikiPathways

11 April 2023 CC-BY-SA hvdsomp and atreloar 10

Registration NeuroLex

11 April 2023 CC-BY-SA hvdsomp and atreloar 11

Registration Nanopublications

11 April 2023 CC-BY-SA hvdsomp and atreloar 12

Registration some observations Decoupling registration from certification Timestamping versioning Registration of various types of objects Machines as creators and contributors

11 April 2023 CC-BY-SA hvdsomp and atreloar 13

Certification PubMed Commons

11 April 2023 CC-BY-SA hvdsomp and atreloar 14

Certification PubPeer

11 April 2023 CC-BY-SA hvdsomp and atreloar 15

Certification Publons

11 April 2023 CC-BY-SA hvdsomp and atreloar 16

Certification some observations Peer-review decoupled from publication process Certification of various types of objects Machines validating form Social endorsement

11 April 2023 CC-BY-SA hvdsomp and atreloar 17

Awareness myExperiment

11 April 2023 CC-BY-SA hvdsomp and atreloar 18

Awareness eLabNotebook RSS

11 April 2023 CC-BY-SA hvdsomp and atreloar 19

Awareness Twitter

11 April 2023 CC-BY-SA hvdsomp and atreloar 20

Awareness some observations Awareness for various types of objects Real time awareness Awareness support targeted at machines Awareness through social media

11 April 2023 CC-BY-SA hvdsomp and atreloar 21

Archiving PDB

11 April 2023 CC-BY-SA hvdsomp and atreloar 22

Archiving GenBank

11 April 2023 CC-BY-SA hvdsomp and atreloar 23

Characterising the future

11 April 2023 CC-BY-SA hvdsomp and atreloar 24

Fundamental changes The research process (objects social

dimension) is becoming more exposed Articles books are no longer the only

relevant objects for research communication Objects are no longer static Machines are joining humans as

(co-)creators and consumers of research objects

11 April 2023 CC-BY-SA hvdsomp and atreloar 25

Pathfinder problems Integrity of the scholarly record The three obsolescences

hardware file format software

11 April 2023 CC-BY-SA atreloar 26

System of Journals Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 27

Web of Objects Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 28

Not just citation relationships

11 April 2023 CC-BY-SA hvdsomp and atreloar 29

Your Text Here

The problem of obsolescence Lifescience research environment can be viewed as

undergoing a process of accelerated evolution Other disciplines will hit these problems in time

11 April 2023 CC-BY-SA atreloar 30

Cambrian explosion

11 April 2023 31

Hardware obsolescence Roche 454

11 April 2023 CC-BY-SA atreloar 32

Software obsolescence too much choice not enough support

11 April 2023 CC-BY-SA atreloar 33

Abandonware ldquoLast summer a member of the biology department of the

University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004

11 April 2023 CC-BY-SA atreloar 34

File format obsolescence Illumina Probability of error in basecalling encoded using ascii code

to reduce file size Meaning of the ascii code changed along the life cycle and

for data generated at different time points the quality might be encoded differently

ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip

11 April 2023 CC-BY-SA atreloar 35

Everett Rogers Diffusion of Innovation 1962

11 April 2023 CC-BY-SA atreloar 36

Conclusions Need to move to a smaller number of standard file

formats Need to move to a more sustainable model of

software development and maintenance Need to encourage platform manufacturers to

innovate around the hardware not the software NOTE other disciplines are looking to lifesciences

to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37

On best practices in the development of bioinformatics software Front Genet 02 Jul 14

Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories

available

11 April 2023 CC-BY-SA atreloar 38

Questions andrewtreloarandsorgau

atreloar

httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice

11 April 2023 CC-BY-SA atreloar 39

  • The life-sciences as a pathfinder in data-intensive research pr
  • Structure presentation
  • So many lifecycleshellip
  • Minimal Research Lifecycle
  • Sharing Scholarly Communication System and its Functions
  • System of Journals
  • Pointers to the future
  • Registration BioRxiv
  • Registration Github
  • Registration WikiPathways
  • Registration NeuroLex
  • Registration Nanopublications
  • Registration some observations
  • Certification PubMed Commons
  • Certification PubPeer
  • Certification Publons
  • Certification some observations
  • Awareness myExperiment
  • Awareness eLabNotebook RSS
  • Awareness Twitter
  • Awareness some observations
  • Archiving PDB
  • Archiving GenBank
  • Characterising the future
  • Fundamental changes
  • Pathfinder problems
  • System of Journals Archiving
  • Web of Objects Archiving
  • Not just citation relationships
  • The problem of obsolescence
  • Cambrian explosion
  • Hardware obsolescence Roche 454
  • Software obsolescence too much choice not enough support
  • Abandonware
  • File format obsolescence Illumina
  • Everett Rogers Diffusion of Innovation 1962
  • Conclusions
  • On best practices in the development of bioinformatics software
  • Questions
Page 4: The life-sciences as a pathfinder in data-intensive research practice

Minimal Research Lifecycle

Think

DoShare

11 April 2023 CC-BY-SA atreloar 4

Sharing Scholarly Communication System and its Functions

Registration Certification Awareness Archiving

(Rosendaal and Geurts 1997)

11 April 2023 CC-BY-SA hvdsomp and atreloar 5

System of Journals Registration

submission of manuscript

Certification peer-review (pre-publication) commentary (post-publication)

Awareness discovery services

Archiving libraries (print) publishers (electronic) special purpose organisations (eg Portico)

11 April 2023 CC-BY-SA hvdsomp and atreloar 6

Pointers to the future

ldquothe future is already here ndash itrsquos just not very evenly distributedrdquo

William Gibson NPR interview

11 April 2023 CC-BY-SA hvdsomp and atreloar 7

Registration BioRxiv

11 April 2023 CC-BY-SA hvdsomp and atreloar 8

Registration Github

11 April 2023 CC-BY-SA hvdsomp and atreloar 9

Registration WikiPathways

11 April 2023 CC-BY-SA hvdsomp and atreloar 10

Registration NeuroLex

11 April 2023 CC-BY-SA hvdsomp and atreloar 11

Registration Nanopublications

11 April 2023 CC-BY-SA hvdsomp and atreloar 12

Registration some observations Decoupling registration from certification Timestamping versioning Registration of various types of objects Machines as creators and contributors

11 April 2023 CC-BY-SA hvdsomp and atreloar 13

Certification PubMed Commons

11 April 2023 CC-BY-SA hvdsomp and atreloar 14

Certification PubPeer

11 April 2023 CC-BY-SA hvdsomp and atreloar 15

Certification Publons

11 April 2023 CC-BY-SA hvdsomp and atreloar 16

Certification some observations Peer-review decoupled from publication process Certification of various types of objects Machines validating form Social endorsement

11 April 2023 CC-BY-SA hvdsomp and atreloar 17

Awareness myExperiment

11 April 2023 CC-BY-SA hvdsomp and atreloar 18

Awareness eLabNotebook RSS

11 April 2023 CC-BY-SA hvdsomp and atreloar 19

Awareness Twitter

11 April 2023 CC-BY-SA hvdsomp and atreloar 20

Awareness some observations Awareness for various types of objects Real time awareness Awareness support targeted at machines Awareness through social media

11 April 2023 CC-BY-SA hvdsomp and atreloar 21

Archiving PDB

11 April 2023 CC-BY-SA hvdsomp and atreloar 22

Archiving GenBank

11 April 2023 CC-BY-SA hvdsomp and atreloar 23

Characterising the future

11 April 2023 CC-BY-SA hvdsomp and atreloar 24

Fundamental changes The research process (objects social

dimension) is becoming more exposed Articles books are no longer the only

relevant objects for research communication Objects are no longer static Machines are joining humans as

(co-)creators and consumers of research objects

11 April 2023 CC-BY-SA hvdsomp and atreloar 25

Pathfinder problems Integrity of the scholarly record The three obsolescences

hardware file format software

11 April 2023 CC-BY-SA atreloar 26

System of Journals Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 27

Web of Objects Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 28

Not just citation relationships

11 April 2023 CC-BY-SA hvdsomp and atreloar 29

Your Text Here

The problem of obsolescence Lifescience research environment can be viewed as

undergoing a process of accelerated evolution Other disciplines will hit these problems in time

11 April 2023 CC-BY-SA atreloar 30

Cambrian explosion

11 April 2023 31

Hardware obsolescence Roche 454

11 April 2023 CC-BY-SA atreloar 32

Software obsolescence too much choice not enough support

11 April 2023 CC-BY-SA atreloar 33

Abandonware ldquoLast summer a member of the biology department of the

University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004

11 April 2023 CC-BY-SA atreloar 34

File format obsolescence Illumina Probability of error in basecalling encoded using ascii code

to reduce file size Meaning of the ascii code changed along the life cycle and

for data generated at different time points the quality might be encoded differently

ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip

11 April 2023 CC-BY-SA atreloar 35

Everett Rogers Diffusion of Innovation 1962

11 April 2023 CC-BY-SA atreloar 36

Conclusions Need to move to a smaller number of standard file

formats Need to move to a more sustainable model of

software development and maintenance Need to encourage platform manufacturers to

innovate around the hardware not the software NOTE other disciplines are looking to lifesciences

to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37

On best practices in the development of bioinformatics software Front Genet 02 Jul 14

Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories

available

11 April 2023 CC-BY-SA atreloar 38

Questions andrewtreloarandsorgau

atreloar

httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice

11 April 2023 CC-BY-SA atreloar 39

  • The life-sciences as a pathfinder in data-intensive research pr
  • Structure presentation
  • So many lifecycleshellip
  • Minimal Research Lifecycle
  • Sharing Scholarly Communication System and its Functions
  • System of Journals
  • Pointers to the future
  • Registration BioRxiv
  • Registration Github
  • Registration WikiPathways
  • Registration NeuroLex
  • Registration Nanopublications
  • Registration some observations
  • Certification PubMed Commons
  • Certification PubPeer
  • Certification Publons
  • Certification some observations
  • Awareness myExperiment
  • Awareness eLabNotebook RSS
  • Awareness Twitter
  • Awareness some observations
  • Archiving PDB
  • Archiving GenBank
  • Characterising the future
  • Fundamental changes
  • Pathfinder problems
  • System of Journals Archiving
  • Web of Objects Archiving
  • Not just citation relationships
  • The problem of obsolescence
  • Cambrian explosion
  • Hardware obsolescence Roche 454
  • Software obsolescence too much choice not enough support
  • Abandonware
  • File format obsolescence Illumina
  • Everett Rogers Diffusion of Innovation 1962
  • Conclusions
  • On best practices in the development of bioinformatics software
  • Questions
Page 5: The life-sciences as a pathfinder in data-intensive research practice

Sharing Scholarly Communication System and its Functions

Registration Certification Awareness Archiving

(Rosendaal and Geurts 1997)

11 April 2023 CC-BY-SA hvdsomp and atreloar 5

System of Journals Registration

submission of manuscript

Certification peer-review (pre-publication) commentary (post-publication)

Awareness discovery services

Archiving libraries (print) publishers (electronic) special purpose organisations (eg Portico)

11 April 2023 CC-BY-SA hvdsomp and atreloar 6

Pointers to the future

ldquothe future is already here ndash itrsquos just not very evenly distributedrdquo

William Gibson NPR interview

11 April 2023 CC-BY-SA hvdsomp and atreloar 7

Registration BioRxiv

11 April 2023 CC-BY-SA hvdsomp and atreloar 8

Registration Github

11 April 2023 CC-BY-SA hvdsomp and atreloar 9

Registration WikiPathways

11 April 2023 CC-BY-SA hvdsomp and atreloar 10

Registration NeuroLex

11 April 2023 CC-BY-SA hvdsomp and atreloar 11

Registration Nanopublications

11 April 2023 CC-BY-SA hvdsomp and atreloar 12

Registration some observations Decoupling registration from certification Timestamping versioning Registration of various types of objects Machines as creators and contributors

11 April 2023 CC-BY-SA hvdsomp and atreloar 13

Certification PubMed Commons

11 April 2023 CC-BY-SA hvdsomp and atreloar 14

Certification PubPeer

11 April 2023 CC-BY-SA hvdsomp and atreloar 15

Certification Publons

11 April 2023 CC-BY-SA hvdsomp and atreloar 16

Certification some observations Peer-review decoupled from publication process Certification of various types of objects Machines validating form Social endorsement

11 April 2023 CC-BY-SA hvdsomp and atreloar 17

Awareness myExperiment

11 April 2023 CC-BY-SA hvdsomp and atreloar 18

Awareness eLabNotebook RSS

11 April 2023 CC-BY-SA hvdsomp and atreloar 19

Awareness Twitter

11 April 2023 CC-BY-SA hvdsomp and atreloar 20

Awareness some observations Awareness for various types of objects Real time awareness Awareness support targeted at machines Awareness through social media

11 April 2023 CC-BY-SA hvdsomp and atreloar 21

Archiving PDB

11 April 2023 CC-BY-SA hvdsomp and atreloar 22

Archiving GenBank

11 April 2023 CC-BY-SA hvdsomp and atreloar 23

Characterising the future

11 April 2023 CC-BY-SA hvdsomp and atreloar 24

Fundamental changes The research process (objects social

dimension) is becoming more exposed Articles books are no longer the only

relevant objects for research communication Objects are no longer static Machines are joining humans as

(co-)creators and consumers of research objects

11 April 2023 CC-BY-SA hvdsomp and atreloar 25

Pathfinder problems Integrity of the scholarly record The three obsolescences

hardware file format software

11 April 2023 CC-BY-SA atreloar 26

System of Journals Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 27

Web of Objects Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 28

Not just citation relationships

11 April 2023 CC-BY-SA hvdsomp and atreloar 29

Your Text Here

The problem of obsolescence Lifescience research environment can be viewed as

undergoing a process of accelerated evolution Other disciplines will hit these problems in time

11 April 2023 CC-BY-SA atreloar 30

Cambrian explosion

11 April 2023 31

Hardware obsolescence Roche 454

11 April 2023 CC-BY-SA atreloar 32

Software obsolescence too much choice not enough support

11 April 2023 CC-BY-SA atreloar 33

Abandonware ldquoLast summer a member of the biology department of the

University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004

11 April 2023 CC-BY-SA atreloar 34

File format obsolescence Illumina Probability of error in basecalling encoded using ascii code

to reduce file size Meaning of the ascii code changed along the life cycle and

for data generated at different time points the quality might be encoded differently

ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip

11 April 2023 CC-BY-SA atreloar 35

Everett Rogers Diffusion of Innovation 1962

11 April 2023 CC-BY-SA atreloar 36

Conclusions Need to move to a smaller number of standard file

formats Need to move to a more sustainable model of

software development and maintenance Need to encourage platform manufacturers to

innovate around the hardware not the software NOTE other disciplines are looking to lifesciences

to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37

On best practices in the development of bioinformatics software Front Genet 02 Jul 14

Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories

available

11 April 2023 CC-BY-SA atreloar 38

Questions andrewtreloarandsorgau

atreloar

httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice

11 April 2023 CC-BY-SA atreloar 39

  • The life-sciences as a pathfinder in data-intensive research pr
  • Structure presentation
  • So many lifecycleshellip
  • Minimal Research Lifecycle
  • Sharing Scholarly Communication System and its Functions
  • System of Journals
  • Pointers to the future
  • Registration BioRxiv
  • Registration Github
  • Registration WikiPathways
  • Registration NeuroLex
  • Registration Nanopublications
  • Registration some observations
  • Certification PubMed Commons
  • Certification PubPeer
  • Certification Publons
  • Certification some observations
  • Awareness myExperiment
  • Awareness eLabNotebook RSS
  • Awareness Twitter
  • Awareness some observations
  • Archiving PDB
  • Archiving GenBank
  • Characterising the future
  • Fundamental changes
  • Pathfinder problems
  • System of Journals Archiving
  • Web of Objects Archiving
  • Not just citation relationships
  • The problem of obsolescence
  • Cambrian explosion
  • Hardware obsolescence Roche 454
  • Software obsolescence too much choice not enough support
  • Abandonware
  • File format obsolescence Illumina
  • Everett Rogers Diffusion of Innovation 1962
  • Conclusions
  • On best practices in the development of bioinformatics software
  • Questions
Page 6: The life-sciences as a pathfinder in data-intensive research practice

System of Journals Registration

submission of manuscript

Certification peer-review (pre-publication) commentary (post-publication)

Awareness discovery services

Archiving libraries (print) publishers (electronic) special purpose organisations (eg Portico)

11 April 2023 CC-BY-SA hvdsomp and atreloar 6

Pointers to the future

ldquothe future is already here ndash itrsquos just not very evenly distributedrdquo

William Gibson NPR interview

11 April 2023 CC-BY-SA hvdsomp and atreloar 7

Registration BioRxiv

11 April 2023 CC-BY-SA hvdsomp and atreloar 8

Registration Github

11 April 2023 CC-BY-SA hvdsomp and atreloar 9

Registration WikiPathways

11 April 2023 CC-BY-SA hvdsomp and atreloar 10

Registration NeuroLex

11 April 2023 CC-BY-SA hvdsomp and atreloar 11

Registration Nanopublications

11 April 2023 CC-BY-SA hvdsomp and atreloar 12

Registration some observations Decoupling registration from certification Timestamping versioning Registration of various types of objects Machines as creators and contributors

11 April 2023 CC-BY-SA hvdsomp and atreloar 13

Certification PubMed Commons

11 April 2023 CC-BY-SA hvdsomp and atreloar 14

Certification PubPeer

11 April 2023 CC-BY-SA hvdsomp and atreloar 15

Certification Publons

11 April 2023 CC-BY-SA hvdsomp and atreloar 16

Certification some observations Peer-review decoupled from publication process Certification of various types of objects Machines validating form Social endorsement

11 April 2023 CC-BY-SA hvdsomp and atreloar 17

Awareness myExperiment

11 April 2023 CC-BY-SA hvdsomp and atreloar 18

Awareness eLabNotebook RSS

11 April 2023 CC-BY-SA hvdsomp and atreloar 19

Awareness Twitter

11 April 2023 CC-BY-SA hvdsomp and atreloar 20

Awareness some observations Awareness for various types of objects Real time awareness Awareness support targeted at machines Awareness through social media

11 April 2023 CC-BY-SA hvdsomp and atreloar 21

Archiving PDB

11 April 2023 CC-BY-SA hvdsomp and atreloar 22

Archiving GenBank

11 April 2023 CC-BY-SA hvdsomp and atreloar 23

Characterising the future

11 April 2023 CC-BY-SA hvdsomp and atreloar 24

Fundamental changes The research process (objects social

dimension) is becoming more exposed Articles books are no longer the only

relevant objects for research communication Objects are no longer static Machines are joining humans as

(co-)creators and consumers of research objects

11 April 2023 CC-BY-SA hvdsomp and atreloar 25

Pathfinder problems Integrity of the scholarly record The three obsolescences

hardware file format software

11 April 2023 CC-BY-SA atreloar 26

System of Journals Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 27

Web of Objects Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 28

Not just citation relationships

11 April 2023 CC-BY-SA hvdsomp and atreloar 29

Your Text Here

The problem of obsolescence Lifescience research environment can be viewed as

undergoing a process of accelerated evolution Other disciplines will hit these problems in time

11 April 2023 CC-BY-SA atreloar 30

Cambrian explosion

11 April 2023 31

Hardware obsolescence Roche 454

11 April 2023 CC-BY-SA atreloar 32

Software obsolescence too much choice not enough support

11 April 2023 CC-BY-SA atreloar 33

Abandonware ldquoLast summer a member of the biology department of the

University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004

11 April 2023 CC-BY-SA atreloar 34

File format obsolescence Illumina Probability of error in basecalling encoded using ascii code

to reduce file size Meaning of the ascii code changed along the life cycle and

for data generated at different time points the quality might be encoded differently

ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip

11 April 2023 CC-BY-SA atreloar 35

Everett Rogers Diffusion of Innovation 1962

11 April 2023 CC-BY-SA atreloar 36

Conclusions Need to move to a smaller number of standard file

formats Need to move to a more sustainable model of

software development and maintenance Need to encourage platform manufacturers to

innovate around the hardware not the software NOTE other disciplines are looking to lifesciences

to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37

On best practices in the development of bioinformatics software Front Genet 02 Jul 14

Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories

available

11 April 2023 CC-BY-SA atreloar 38

Questions andrewtreloarandsorgau

atreloar

httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice

11 April 2023 CC-BY-SA atreloar 39

  • The life-sciences as a pathfinder in data-intensive research pr
  • Structure presentation
  • So many lifecycleshellip
  • Minimal Research Lifecycle
  • Sharing Scholarly Communication System and its Functions
  • System of Journals
  • Pointers to the future
  • Registration BioRxiv
  • Registration Github
  • Registration WikiPathways
  • Registration NeuroLex
  • Registration Nanopublications
  • Registration some observations
  • Certification PubMed Commons
  • Certification PubPeer
  • Certification Publons
  • Certification some observations
  • Awareness myExperiment
  • Awareness eLabNotebook RSS
  • Awareness Twitter
  • Awareness some observations
  • Archiving PDB
  • Archiving GenBank
  • Characterising the future
  • Fundamental changes
  • Pathfinder problems
  • System of Journals Archiving
  • Web of Objects Archiving
  • Not just citation relationships
  • The problem of obsolescence
  • Cambrian explosion
  • Hardware obsolescence Roche 454
  • Software obsolescence too much choice not enough support
  • Abandonware
  • File format obsolescence Illumina
  • Everett Rogers Diffusion of Innovation 1962
  • Conclusions
  • On best practices in the development of bioinformatics software
  • Questions
Page 7: The life-sciences as a pathfinder in data-intensive research practice

Pointers to the future

ldquothe future is already here ndash itrsquos just not very evenly distributedrdquo

William Gibson NPR interview

11 April 2023 CC-BY-SA hvdsomp and atreloar 7

Registration BioRxiv

11 April 2023 CC-BY-SA hvdsomp and atreloar 8

Registration Github

11 April 2023 CC-BY-SA hvdsomp and atreloar 9

Registration WikiPathways

11 April 2023 CC-BY-SA hvdsomp and atreloar 10

Registration NeuroLex

11 April 2023 CC-BY-SA hvdsomp and atreloar 11

Registration Nanopublications

11 April 2023 CC-BY-SA hvdsomp and atreloar 12

Registration some observations Decoupling registration from certification Timestamping versioning Registration of various types of objects Machines as creators and contributors

11 April 2023 CC-BY-SA hvdsomp and atreloar 13

Certification PubMed Commons

11 April 2023 CC-BY-SA hvdsomp and atreloar 14

Certification PubPeer

11 April 2023 CC-BY-SA hvdsomp and atreloar 15

Certification Publons

11 April 2023 CC-BY-SA hvdsomp and atreloar 16

Certification some observations Peer-review decoupled from publication process Certification of various types of objects Machines validating form Social endorsement

11 April 2023 CC-BY-SA hvdsomp and atreloar 17

Awareness myExperiment

11 April 2023 CC-BY-SA hvdsomp and atreloar 18

Awareness eLabNotebook RSS

11 April 2023 CC-BY-SA hvdsomp and atreloar 19

Awareness Twitter

11 April 2023 CC-BY-SA hvdsomp and atreloar 20

Awareness some observations Awareness for various types of objects Real time awareness Awareness support targeted at machines Awareness through social media

11 April 2023 CC-BY-SA hvdsomp and atreloar 21

Archiving PDB

11 April 2023 CC-BY-SA hvdsomp and atreloar 22

Archiving GenBank

11 April 2023 CC-BY-SA hvdsomp and atreloar 23

Characterising the future

11 April 2023 CC-BY-SA hvdsomp and atreloar 24

Fundamental changes The research process (objects social

dimension) is becoming more exposed Articles books are no longer the only

relevant objects for research communication Objects are no longer static Machines are joining humans as

(co-)creators and consumers of research objects

11 April 2023 CC-BY-SA hvdsomp and atreloar 25

Pathfinder problems Integrity of the scholarly record The three obsolescences

hardware file format software

11 April 2023 CC-BY-SA atreloar 26

System of Journals Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 27

Web of Objects Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 28

Not just citation relationships

11 April 2023 CC-BY-SA hvdsomp and atreloar 29

Your Text Here

The problem of obsolescence Lifescience research environment can be viewed as

undergoing a process of accelerated evolution Other disciplines will hit these problems in time

11 April 2023 CC-BY-SA atreloar 30

Cambrian explosion

11 April 2023 31

Hardware obsolescence Roche 454

11 April 2023 CC-BY-SA atreloar 32

Software obsolescence too much choice not enough support

11 April 2023 CC-BY-SA atreloar 33

Abandonware ldquoLast summer a member of the biology department of the

University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004

11 April 2023 CC-BY-SA atreloar 34

File format obsolescence Illumina Probability of error in basecalling encoded using ascii code

to reduce file size Meaning of the ascii code changed along the life cycle and

for data generated at different time points the quality might be encoded differently

ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip

11 April 2023 CC-BY-SA atreloar 35

Everett Rogers Diffusion of Innovation 1962

11 April 2023 CC-BY-SA atreloar 36

Conclusions Need to move to a smaller number of standard file

formats Need to move to a more sustainable model of

software development and maintenance Need to encourage platform manufacturers to

innovate around the hardware not the software NOTE other disciplines are looking to lifesciences

to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37

On best practices in the development of bioinformatics software Front Genet 02 Jul 14

Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories

available

11 April 2023 CC-BY-SA atreloar 38

Questions andrewtreloarandsorgau

atreloar

httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice

11 April 2023 CC-BY-SA atreloar 39

  • The life-sciences as a pathfinder in data-intensive research pr
  • Structure presentation
  • So many lifecycleshellip
  • Minimal Research Lifecycle
  • Sharing Scholarly Communication System and its Functions
  • System of Journals
  • Pointers to the future
  • Registration BioRxiv
  • Registration Github
  • Registration WikiPathways
  • Registration NeuroLex
  • Registration Nanopublications
  • Registration some observations
  • Certification PubMed Commons
  • Certification PubPeer
  • Certification Publons
  • Certification some observations
  • Awareness myExperiment
  • Awareness eLabNotebook RSS
  • Awareness Twitter
  • Awareness some observations
  • Archiving PDB
  • Archiving GenBank
  • Characterising the future
  • Fundamental changes
  • Pathfinder problems
  • System of Journals Archiving
  • Web of Objects Archiving
  • Not just citation relationships
  • The problem of obsolescence
  • Cambrian explosion
  • Hardware obsolescence Roche 454
  • Software obsolescence too much choice not enough support
  • Abandonware
  • File format obsolescence Illumina
  • Everett Rogers Diffusion of Innovation 1962
  • Conclusions
  • On best practices in the development of bioinformatics software
  • Questions
Page 8: The life-sciences as a pathfinder in data-intensive research practice

Registration BioRxiv

11 April 2023 CC-BY-SA hvdsomp and atreloar 8

Registration Github

11 April 2023 CC-BY-SA hvdsomp and atreloar 9

Registration WikiPathways

11 April 2023 CC-BY-SA hvdsomp and atreloar 10

Registration NeuroLex

11 April 2023 CC-BY-SA hvdsomp and atreloar 11

Registration Nanopublications

11 April 2023 CC-BY-SA hvdsomp and atreloar 12

Registration some observations Decoupling registration from certification Timestamping versioning Registration of various types of objects Machines as creators and contributors

11 April 2023 CC-BY-SA hvdsomp and atreloar 13

Certification PubMed Commons

11 April 2023 CC-BY-SA hvdsomp and atreloar 14

Certification PubPeer

11 April 2023 CC-BY-SA hvdsomp and atreloar 15

Certification Publons

11 April 2023 CC-BY-SA hvdsomp and atreloar 16

Certification some observations Peer-review decoupled from publication process Certification of various types of objects Machines validating form Social endorsement

11 April 2023 CC-BY-SA hvdsomp and atreloar 17

Awareness myExperiment

11 April 2023 CC-BY-SA hvdsomp and atreloar 18

Awareness eLabNotebook RSS

11 April 2023 CC-BY-SA hvdsomp and atreloar 19

Awareness Twitter

11 April 2023 CC-BY-SA hvdsomp and atreloar 20

Awareness some observations Awareness for various types of objects Real time awareness Awareness support targeted at machines Awareness through social media

11 April 2023 CC-BY-SA hvdsomp and atreloar 21

Archiving PDB

11 April 2023 CC-BY-SA hvdsomp and atreloar 22

Archiving GenBank

11 April 2023 CC-BY-SA hvdsomp and atreloar 23

Characterising the future

11 April 2023 CC-BY-SA hvdsomp and atreloar 24

Fundamental changes The research process (objects social

dimension) is becoming more exposed Articles books are no longer the only

relevant objects for research communication Objects are no longer static Machines are joining humans as

(co-)creators and consumers of research objects

11 April 2023 CC-BY-SA hvdsomp and atreloar 25

Pathfinder problems Integrity of the scholarly record The three obsolescences

hardware file format software

11 April 2023 CC-BY-SA atreloar 26

System of Journals Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 27

Web of Objects Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 28

Not just citation relationships

11 April 2023 CC-BY-SA hvdsomp and atreloar 29

Your Text Here

The problem of obsolescence Lifescience research environment can be viewed as

undergoing a process of accelerated evolution Other disciplines will hit these problems in time

11 April 2023 CC-BY-SA atreloar 30

Cambrian explosion

11 April 2023 31

Hardware obsolescence Roche 454

11 April 2023 CC-BY-SA atreloar 32

Software obsolescence too much choice not enough support

11 April 2023 CC-BY-SA atreloar 33

Abandonware ldquoLast summer a member of the biology department of the

University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004

11 April 2023 CC-BY-SA atreloar 34

File format obsolescence Illumina Probability of error in basecalling encoded using ascii code

to reduce file size Meaning of the ascii code changed along the life cycle and

for data generated at different time points the quality might be encoded differently

ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip

11 April 2023 CC-BY-SA atreloar 35

Everett Rogers Diffusion of Innovation 1962

11 April 2023 CC-BY-SA atreloar 36

Conclusions Need to move to a smaller number of standard file

formats Need to move to a more sustainable model of

software development and maintenance Need to encourage platform manufacturers to

innovate around the hardware not the software NOTE other disciplines are looking to lifesciences

to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37

On best practices in the development of bioinformatics software Front Genet 02 Jul 14

Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories

available

11 April 2023 CC-BY-SA atreloar 38

Questions andrewtreloarandsorgau

atreloar

httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice

11 April 2023 CC-BY-SA atreloar 39

  • The life-sciences as a pathfinder in data-intensive research pr
  • Structure presentation
  • So many lifecycleshellip
  • Minimal Research Lifecycle
  • Sharing Scholarly Communication System and its Functions
  • System of Journals
  • Pointers to the future
  • Registration BioRxiv
  • Registration Github
  • Registration WikiPathways
  • Registration NeuroLex
  • Registration Nanopublications
  • Registration some observations
  • Certification PubMed Commons
  • Certification PubPeer
  • Certification Publons
  • Certification some observations
  • Awareness myExperiment
  • Awareness eLabNotebook RSS
  • Awareness Twitter
  • Awareness some observations
  • Archiving PDB
  • Archiving GenBank
  • Characterising the future
  • Fundamental changes
  • Pathfinder problems
  • System of Journals Archiving
  • Web of Objects Archiving
  • Not just citation relationships
  • The problem of obsolescence
  • Cambrian explosion
  • Hardware obsolescence Roche 454
  • Software obsolescence too much choice not enough support
  • Abandonware
  • File format obsolescence Illumina
  • Everett Rogers Diffusion of Innovation 1962
  • Conclusions
  • On best practices in the development of bioinformatics software
  • Questions
Page 9: The life-sciences as a pathfinder in data-intensive research practice

Registration Github

11 April 2023 CC-BY-SA hvdsomp and atreloar 9

Registration WikiPathways

11 April 2023 CC-BY-SA hvdsomp and atreloar 10

Registration NeuroLex

11 April 2023 CC-BY-SA hvdsomp and atreloar 11

Registration Nanopublications

11 April 2023 CC-BY-SA hvdsomp and atreloar 12

Registration some observations Decoupling registration from certification Timestamping versioning Registration of various types of objects Machines as creators and contributors

11 April 2023 CC-BY-SA hvdsomp and atreloar 13

Certification PubMed Commons

11 April 2023 CC-BY-SA hvdsomp and atreloar 14

Certification PubPeer

11 April 2023 CC-BY-SA hvdsomp and atreloar 15

Certification Publons

11 April 2023 CC-BY-SA hvdsomp and atreloar 16

Certification some observations Peer-review decoupled from publication process Certification of various types of objects Machines validating form Social endorsement

11 April 2023 CC-BY-SA hvdsomp and atreloar 17

Awareness myExperiment

11 April 2023 CC-BY-SA hvdsomp and atreloar 18

Awareness eLabNotebook RSS

11 April 2023 CC-BY-SA hvdsomp and atreloar 19

Awareness Twitter

11 April 2023 CC-BY-SA hvdsomp and atreloar 20

Awareness some observations Awareness for various types of objects Real time awareness Awareness support targeted at machines Awareness through social media

11 April 2023 CC-BY-SA hvdsomp and atreloar 21

Archiving PDB

11 April 2023 CC-BY-SA hvdsomp and atreloar 22

Archiving GenBank

11 April 2023 CC-BY-SA hvdsomp and atreloar 23

Characterising the future

11 April 2023 CC-BY-SA hvdsomp and atreloar 24

Fundamental changes The research process (objects social

dimension) is becoming more exposed Articles books are no longer the only

relevant objects for research communication Objects are no longer static Machines are joining humans as

(co-)creators and consumers of research objects

11 April 2023 CC-BY-SA hvdsomp and atreloar 25

Pathfinder problems Integrity of the scholarly record The three obsolescences

hardware file format software

11 April 2023 CC-BY-SA atreloar 26

System of Journals Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 27

Web of Objects Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 28

Not just citation relationships

11 April 2023 CC-BY-SA hvdsomp and atreloar 29

Your Text Here

The problem of obsolescence Lifescience research environment can be viewed as

undergoing a process of accelerated evolution Other disciplines will hit these problems in time

11 April 2023 CC-BY-SA atreloar 30

Cambrian explosion

11 April 2023 31

Hardware obsolescence Roche 454

11 April 2023 CC-BY-SA atreloar 32

Software obsolescence too much choice not enough support

11 April 2023 CC-BY-SA atreloar 33

Abandonware ldquoLast summer a member of the biology department of the

University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004

11 April 2023 CC-BY-SA atreloar 34

File format obsolescence Illumina Probability of error in basecalling encoded using ascii code

to reduce file size Meaning of the ascii code changed along the life cycle and

for data generated at different time points the quality might be encoded differently

ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip

11 April 2023 CC-BY-SA atreloar 35

Everett Rogers Diffusion of Innovation 1962

11 April 2023 CC-BY-SA atreloar 36

Conclusions Need to move to a smaller number of standard file

formats Need to move to a more sustainable model of

software development and maintenance Need to encourage platform manufacturers to

innovate around the hardware not the software NOTE other disciplines are looking to lifesciences

to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37

On best practices in the development of bioinformatics software Front Genet 02 Jul 14

Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories

available

11 April 2023 CC-BY-SA atreloar 38

Questions andrewtreloarandsorgau

atreloar

httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice

11 April 2023 CC-BY-SA atreloar 39

  • The life-sciences as a pathfinder in data-intensive research pr
  • Structure presentation
  • So many lifecycleshellip
  • Minimal Research Lifecycle
  • Sharing Scholarly Communication System and its Functions
  • System of Journals
  • Pointers to the future
  • Registration BioRxiv
  • Registration Github
  • Registration WikiPathways
  • Registration NeuroLex
  • Registration Nanopublications
  • Registration some observations
  • Certification PubMed Commons
  • Certification PubPeer
  • Certification Publons
  • Certification some observations
  • Awareness myExperiment
  • Awareness eLabNotebook RSS
  • Awareness Twitter
  • Awareness some observations
  • Archiving PDB
  • Archiving GenBank
  • Characterising the future
  • Fundamental changes
  • Pathfinder problems
  • System of Journals Archiving
  • Web of Objects Archiving
  • Not just citation relationships
  • The problem of obsolescence
  • Cambrian explosion
  • Hardware obsolescence Roche 454
  • Software obsolescence too much choice not enough support
  • Abandonware
  • File format obsolescence Illumina
  • Everett Rogers Diffusion of Innovation 1962
  • Conclusions
  • On best practices in the development of bioinformatics software
  • Questions
Page 10: The life-sciences as a pathfinder in data-intensive research practice

Registration WikiPathways

11 April 2023 CC-BY-SA hvdsomp and atreloar 10

Registration NeuroLex

11 April 2023 CC-BY-SA hvdsomp and atreloar 11

Registration Nanopublications

11 April 2023 CC-BY-SA hvdsomp and atreloar 12

Registration some observations Decoupling registration from certification Timestamping versioning Registration of various types of objects Machines as creators and contributors

11 April 2023 CC-BY-SA hvdsomp and atreloar 13

Certification PubMed Commons

11 April 2023 CC-BY-SA hvdsomp and atreloar 14

Certification PubPeer

11 April 2023 CC-BY-SA hvdsomp and atreloar 15

Certification Publons

11 April 2023 CC-BY-SA hvdsomp and atreloar 16

Certification some observations Peer-review decoupled from publication process Certification of various types of objects Machines validating form Social endorsement

11 April 2023 CC-BY-SA hvdsomp and atreloar 17

Awareness myExperiment

11 April 2023 CC-BY-SA hvdsomp and atreloar 18

Awareness eLabNotebook RSS

11 April 2023 CC-BY-SA hvdsomp and atreloar 19

Awareness Twitter

11 April 2023 CC-BY-SA hvdsomp and atreloar 20

Awareness some observations Awareness for various types of objects Real time awareness Awareness support targeted at machines Awareness through social media

11 April 2023 CC-BY-SA hvdsomp and atreloar 21

Archiving PDB

11 April 2023 CC-BY-SA hvdsomp and atreloar 22

Archiving GenBank

11 April 2023 CC-BY-SA hvdsomp and atreloar 23

Characterising the future

11 April 2023 CC-BY-SA hvdsomp and atreloar 24

Fundamental changes The research process (objects social

dimension) is becoming more exposed Articles books are no longer the only

relevant objects for research communication Objects are no longer static Machines are joining humans as

(co-)creators and consumers of research objects

11 April 2023 CC-BY-SA hvdsomp and atreloar 25

Pathfinder problems Integrity of the scholarly record The three obsolescences

hardware file format software

11 April 2023 CC-BY-SA atreloar 26

System of Journals Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 27

Web of Objects Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 28

Not just citation relationships

11 April 2023 CC-BY-SA hvdsomp and atreloar 29

Your Text Here

The problem of obsolescence Lifescience research environment can be viewed as

undergoing a process of accelerated evolution Other disciplines will hit these problems in time

11 April 2023 CC-BY-SA atreloar 30

Cambrian explosion

11 April 2023 31

Hardware obsolescence Roche 454

11 April 2023 CC-BY-SA atreloar 32

Software obsolescence too much choice not enough support

11 April 2023 CC-BY-SA atreloar 33

Abandonware ldquoLast summer a member of the biology department of the

University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004

11 April 2023 CC-BY-SA atreloar 34

File format obsolescence Illumina Probability of error in basecalling encoded using ascii code

to reduce file size Meaning of the ascii code changed along the life cycle and

for data generated at different time points the quality might be encoded differently

ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip

11 April 2023 CC-BY-SA atreloar 35

Everett Rogers Diffusion of Innovation 1962

11 April 2023 CC-BY-SA atreloar 36

Conclusions Need to move to a smaller number of standard file

formats Need to move to a more sustainable model of

software development and maintenance Need to encourage platform manufacturers to

innovate around the hardware not the software NOTE other disciplines are looking to lifesciences

to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37

On best practices in the development of bioinformatics software Front Genet 02 Jul 14

Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories

available

11 April 2023 CC-BY-SA atreloar 38

Questions andrewtreloarandsorgau

atreloar

httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice

11 April 2023 CC-BY-SA atreloar 39

  • The life-sciences as a pathfinder in data-intensive research pr
  • Structure presentation
  • So many lifecycleshellip
  • Minimal Research Lifecycle
  • Sharing Scholarly Communication System and its Functions
  • System of Journals
  • Pointers to the future
  • Registration BioRxiv
  • Registration Github
  • Registration WikiPathways
  • Registration NeuroLex
  • Registration Nanopublications
  • Registration some observations
  • Certification PubMed Commons
  • Certification PubPeer
  • Certification Publons
  • Certification some observations
  • Awareness myExperiment
  • Awareness eLabNotebook RSS
  • Awareness Twitter
  • Awareness some observations
  • Archiving PDB
  • Archiving GenBank
  • Characterising the future
  • Fundamental changes
  • Pathfinder problems
  • System of Journals Archiving
  • Web of Objects Archiving
  • Not just citation relationships
  • The problem of obsolescence
  • Cambrian explosion
  • Hardware obsolescence Roche 454
  • Software obsolescence too much choice not enough support
  • Abandonware
  • File format obsolescence Illumina
  • Everett Rogers Diffusion of Innovation 1962
  • Conclusions
  • On best practices in the development of bioinformatics software
  • Questions
Page 11: The life-sciences as a pathfinder in data-intensive research practice

Registration NeuroLex

11 April 2023 CC-BY-SA hvdsomp and atreloar 11

Registration Nanopublications

11 April 2023 CC-BY-SA hvdsomp and atreloar 12

Registration some observations Decoupling registration from certification Timestamping versioning Registration of various types of objects Machines as creators and contributors

11 April 2023 CC-BY-SA hvdsomp and atreloar 13

Certification PubMed Commons

11 April 2023 CC-BY-SA hvdsomp and atreloar 14

Certification PubPeer

11 April 2023 CC-BY-SA hvdsomp and atreloar 15

Certification Publons

11 April 2023 CC-BY-SA hvdsomp and atreloar 16

Certification some observations Peer-review decoupled from publication process Certification of various types of objects Machines validating form Social endorsement

11 April 2023 CC-BY-SA hvdsomp and atreloar 17

Awareness myExperiment

11 April 2023 CC-BY-SA hvdsomp and atreloar 18

Awareness eLabNotebook RSS

11 April 2023 CC-BY-SA hvdsomp and atreloar 19

Awareness Twitter

11 April 2023 CC-BY-SA hvdsomp and atreloar 20

Awareness some observations Awareness for various types of objects Real time awareness Awareness support targeted at machines Awareness through social media

11 April 2023 CC-BY-SA hvdsomp and atreloar 21

Archiving PDB

11 April 2023 CC-BY-SA hvdsomp and atreloar 22

Archiving GenBank

11 April 2023 CC-BY-SA hvdsomp and atreloar 23

Characterising the future

11 April 2023 CC-BY-SA hvdsomp and atreloar 24

Fundamental changes The research process (objects social

dimension) is becoming more exposed Articles books are no longer the only

relevant objects for research communication Objects are no longer static Machines are joining humans as

(co-)creators and consumers of research objects

11 April 2023 CC-BY-SA hvdsomp and atreloar 25

Pathfinder problems Integrity of the scholarly record The three obsolescences

hardware file format software

11 April 2023 CC-BY-SA atreloar 26

System of Journals Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 27

Web of Objects Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 28

Not just citation relationships

11 April 2023 CC-BY-SA hvdsomp and atreloar 29

Your Text Here

The problem of obsolescence Lifescience research environment can be viewed as

undergoing a process of accelerated evolution Other disciplines will hit these problems in time

11 April 2023 CC-BY-SA atreloar 30

Cambrian explosion

11 April 2023 31

Hardware obsolescence Roche 454

11 April 2023 CC-BY-SA atreloar 32

Software obsolescence too much choice not enough support

11 April 2023 CC-BY-SA atreloar 33

Abandonware ldquoLast summer a member of the biology department of the

University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004

11 April 2023 CC-BY-SA atreloar 34

File format obsolescence Illumina Probability of error in basecalling encoded using ascii code

to reduce file size Meaning of the ascii code changed along the life cycle and

for data generated at different time points the quality might be encoded differently

ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip

11 April 2023 CC-BY-SA atreloar 35

Everett Rogers Diffusion of Innovation 1962

11 April 2023 CC-BY-SA atreloar 36

Conclusions Need to move to a smaller number of standard file

formats Need to move to a more sustainable model of

software development and maintenance Need to encourage platform manufacturers to

innovate around the hardware not the software NOTE other disciplines are looking to lifesciences

to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37

On best practices in the development of bioinformatics software Front Genet 02 Jul 14

Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories

available

11 April 2023 CC-BY-SA atreloar 38

Questions andrewtreloarandsorgau

atreloar

httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice

11 April 2023 CC-BY-SA atreloar 39

  • The life-sciences as a pathfinder in data-intensive research pr
  • Structure presentation
  • So many lifecycleshellip
  • Minimal Research Lifecycle
  • Sharing Scholarly Communication System and its Functions
  • System of Journals
  • Pointers to the future
  • Registration BioRxiv
  • Registration Github
  • Registration WikiPathways
  • Registration NeuroLex
  • Registration Nanopublications
  • Registration some observations
  • Certification PubMed Commons
  • Certification PubPeer
  • Certification Publons
  • Certification some observations
  • Awareness myExperiment
  • Awareness eLabNotebook RSS
  • Awareness Twitter
  • Awareness some observations
  • Archiving PDB
  • Archiving GenBank
  • Characterising the future
  • Fundamental changes
  • Pathfinder problems
  • System of Journals Archiving
  • Web of Objects Archiving
  • Not just citation relationships
  • The problem of obsolescence
  • Cambrian explosion
  • Hardware obsolescence Roche 454
  • Software obsolescence too much choice not enough support
  • Abandonware
  • File format obsolescence Illumina
  • Everett Rogers Diffusion of Innovation 1962
  • Conclusions
  • On best practices in the development of bioinformatics software
  • Questions
Page 12: The life-sciences as a pathfinder in data-intensive research practice

Registration Nanopublications

11 April 2023 CC-BY-SA hvdsomp and atreloar 12

Registration some observations Decoupling registration from certification Timestamping versioning Registration of various types of objects Machines as creators and contributors

11 April 2023 CC-BY-SA hvdsomp and atreloar 13

Certification PubMed Commons

11 April 2023 CC-BY-SA hvdsomp and atreloar 14

Certification PubPeer

11 April 2023 CC-BY-SA hvdsomp and atreloar 15

Certification Publons

11 April 2023 CC-BY-SA hvdsomp and atreloar 16

Certification some observations Peer-review decoupled from publication process Certification of various types of objects Machines validating form Social endorsement

11 April 2023 CC-BY-SA hvdsomp and atreloar 17

Awareness myExperiment

11 April 2023 CC-BY-SA hvdsomp and atreloar 18

Awareness eLabNotebook RSS

11 April 2023 CC-BY-SA hvdsomp and atreloar 19

Awareness Twitter

11 April 2023 CC-BY-SA hvdsomp and atreloar 20

Awareness some observations Awareness for various types of objects Real time awareness Awareness support targeted at machines Awareness through social media

11 April 2023 CC-BY-SA hvdsomp and atreloar 21

Archiving PDB

11 April 2023 CC-BY-SA hvdsomp and atreloar 22

Archiving GenBank

11 April 2023 CC-BY-SA hvdsomp and atreloar 23

Characterising the future

11 April 2023 CC-BY-SA hvdsomp and atreloar 24

Fundamental changes The research process (objects social

dimension) is becoming more exposed Articles books are no longer the only

relevant objects for research communication Objects are no longer static Machines are joining humans as

(co-)creators and consumers of research objects

11 April 2023 CC-BY-SA hvdsomp and atreloar 25

Pathfinder problems Integrity of the scholarly record The three obsolescences

hardware file format software

11 April 2023 CC-BY-SA atreloar 26

System of Journals Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 27

Web of Objects Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 28

Not just citation relationships

11 April 2023 CC-BY-SA hvdsomp and atreloar 29

Your Text Here

The problem of obsolescence Lifescience research environment can be viewed as

undergoing a process of accelerated evolution Other disciplines will hit these problems in time

11 April 2023 CC-BY-SA atreloar 30

Cambrian explosion

11 April 2023 31

Hardware obsolescence Roche 454

11 April 2023 CC-BY-SA atreloar 32

Software obsolescence too much choice not enough support

11 April 2023 CC-BY-SA atreloar 33

Abandonware ldquoLast summer a member of the biology department of the

University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004

11 April 2023 CC-BY-SA atreloar 34

File format obsolescence Illumina Probability of error in basecalling encoded using ascii code

to reduce file size Meaning of the ascii code changed along the life cycle and

for data generated at different time points the quality might be encoded differently

ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip

11 April 2023 CC-BY-SA atreloar 35

Everett Rogers Diffusion of Innovation 1962

11 April 2023 CC-BY-SA atreloar 36

Conclusions Need to move to a smaller number of standard file

formats Need to move to a more sustainable model of

software development and maintenance Need to encourage platform manufacturers to

innovate around the hardware not the software NOTE other disciplines are looking to lifesciences

to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37

On best practices in the development of bioinformatics software Front Genet 02 Jul 14

Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories

available

11 April 2023 CC-BY-SA atreloar 38

Questions andrewtreloarandsorgau

atreloar

httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice

11 April 2023 CC-BY-SA atreloar 39

  • The life-sciences as a pathfinder in data-intensive research pr
  • Structure presentation
  • So many lifecycleshellip
  • Minimal Research Lifecycle
  • Sharing Scholarly Communication System and its Functions
  • System of Journals
  • Pointers to the future
  • Registration BioRxiv
  • Registration Github
  • Registration WikiPathways
  • Registration NeuroLex
  • Registration Nanopublications
  • Registration some observations
  • Certification PubMed Commons
  • Certification PubPeer
  • Certification Publons
  • Certification some observations
  • Awareness myExperiment
  • Awareness eLabNotebook RSS
  • Awareness Twitter
  • Awareness some observations
  • Archiving PDB
  • Archiving GenBank
  • Characterising the future
  • Fundamental changes
  • Pathfinder problems
  • System of Journals Archiving
  • Web of Objects Archiving
  • Not just citation relationships
  • The problem of obsolescence
  • Cambrian explosion
  • Hardware obsolescence Roche 454
  • Software obsolescence too much choice not enough support
  • Abandonware
  • File format obsolescence Illumina
  • Everett Rogers Diffusion of Innovation 1962
  • Conclusions
  • On best practices in the development of bioinformatics software
  • Questions
Page 13: The life-sciences as a pathfinder in data-intensive research practice

Registration some observations Decoupling registration from certification Timestamping versioning Registration of various types of objects Machines as creators and contributors

11 April 2023 CC-BY-SA hvdsomp and atreloar 13

Certification PubMed Commons

11 April 2023 CC-BY-SA hvdsomp and atreloar 14

Certification PubPeer

11 April 2023 CC-BY-SA hvdsomp and atreloar 15

Certification Publons

11 April 2023 CC-BY-SA hvdsomp and atreloar 16

Certification some observations Peer-review decoupled from publication process Certification of various types of objects Machines validating form Social endorsement

11 April 2023 CC-BY-SA hvdsomp and atreloar 17

Awareness myExperiment

11 April 2023 CC-BY-SA hvdsomp and atreloar 18

Awareness eLabNotebook RSS

11 April 2023 CC-BY-SA hvdsomp and atreloar 19

Awareness Twitter

11 April 2023 CC-BY-SA hvdsomp and atreloar 20

Awareness some observations Awareness for various types of objects Real time awareness Awareness support targeted at machines Awareness through social media

11 April 2023 CC-BY-SA hvdsomp and atreloar 21

Archiving PDB

11 April 2023 CC-BY-SA hvdsomp and atreloar 22

Archiving GenBank

11 April 2023 CC-BY-SA hvdsomp and atreloar 23

Characterising the future

11 April 2023 CC-BY-SA hvdsomp and atreloar 24

Fundamental changes The research process (objects social

dimension) is becoming more exposed Articles books are no longer the only

relevant objects for research communication Objects are no longer static Machines are joining humans as

(co-)creators and consumers of research objects

11 April 2023 CC-BY-SA hvdsomp and atreloar 25

Pathfinder problems Integrity of the scholarly record The three obsolescences

hardware file format software

11 April 2023 CC-BY-SA atreloar 26

System of Journals Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 27

Web of Objects Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 28

Not just citation relationships

11 April 2023 CC-BY-SA hvdsomp and atreloar 29

Your Text Here

The problem of obsolescence Lifescience research environment can be viewed as

undergoing a process of accelerated evolution Other disciplines will hit these problems in time

11 April 2023 CC-BY-SA atreloar 30

Cambrian explosion

11 April 2023 31

Hardware obsolescence Roche 454

11 April 2023 CC-BY-SA atreloar 32

Software obsolescence too much choice not enough support

11 April 2023 CC-BY-SA atreloar 33

Abandonware ldquoLast summer a member of the biology department of the

University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004

11 April 2023 CC-BY-SA atreloar 34

File format obsolescence Illumina Probability of error in basecalling encoded using ascii code

to reduce file size Meaning of the ascii code changed along the life cycle and

for data generated at different time points the quality might be encoded differently

ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip

11 April 2023 CC-BY-SA atreloar 35

Everett Rogers Diffusion of Innovation 1962

11 April 2023 CC-BY-SA atreloar 36

Conclusions Need to move to a smaller number of standard file

formats Need to move to a more sustainable model of

software development and maintenance Need to encourage platform manufacturers to

innovate around the hardware not the software NOTE other disciplines are looking to lifesciences

to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37

On best practices in the development of bioinformatics software Front Genet 02 Jul 14

Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories

available

11 April 2023 CC-BY-SA atreloar 38

Questions andrewtreloarandsorgau

atreloar

httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice

11 April 2023 CC-BY-SA atreloar 39

  • The life-sciences as a pathfinder in data-intensive research pr
  • Structure presentation
  • So many lifecycleshellip
  • Minimal Research Lifecycle
  • Sharing Scholarly Communication System and its Functions
  • System of Journals
  • Pointers to the future
  • Registration BioRxiv
  • Registration Github
  • Registration WikiPathways
  • Registration NeuroLex
  • Registration Nanopublications
  • Registration some observations
  • Certification PubMed Commons
  • Certification PubPeer
  • Certification Publons
  • Certification some observations
  • Awareness myExperiment
  • Awareness eLabNotebook RSS
  • Awareness Twitter
  • Awareness some observations
  • Archiving PDB
  • Archiving GenBank
  • Characterising the future
  • Fundamental changes
  • Pathfinder problems
  • System of Journals Archiving
  • Web of Objects Archiving
  • Not just citation relationships
  • The problem of obsolescence
  • Cambrian explosion
  • Hardware obsolescence Roche 454
  • Software obsolescence too much choice not enough support
  • Abandonware
  • File format obsolescence Illumina
  • Everett Rogers Diffusion of Innovation 1962
  • Conclusions
  • On best practices in the development of bioinformatics software
  • Questions
Page 14: The life-sciences as a pathfinder in data-intensive research practice

Certification PubMed Commons

11 April 2023 CC-BY-SA hvdsomp and atreloar 14

Certification PubPeer

11 April 2023 CC-BY-SA hvdsomp and atreloar 15

Certification Publons

11 April 2023 CC-BY-SA hvdsomp and atreloar 16

Certification some observations Peer-review decoupled from publication process Certification of various types of objects Machines validating form Social endorsement

11 April 2023 CC-BY-SA hvdsomp and atreloar 17

Awareness myExperiment

11 April 2023 CC-BY-SA hvdsomp and atreloar 18

Awareness eLabNotebook RSS

11 April 2023 CC-BY-SA hvdsomp and atreloar 19

Awareness Twitter

11 April 2023 CC-BY-SA hvdsomp and atreloar 20

Awareness some observations Awareness for various types of objects Real time awareness Awareness support targeted at machines Awareness through social media

11 April 2023 CC-BY-SA hvdsomp and atreloar 21

Archiving PDB

11 April 2023 CC-BY-SA hvdsomp and atreloar 22

Archiving GenBank

11 April 2023 CC-BY-SA hvdsomp and atreloar 23

Characterising the future

11 April 2023 CC-BY-SA hvdsomp and atreloar 24

Fundamental changes The research process (objects social

dimension) is becoming more exposed Articles books are no longer the only

relevant objects for research communication Objects are no longer static Machines are joining humans as

(co-)creators and consumers of research objects

11 April 2023 CC-BY-SA hvdsomp and atreloar 25

Pathfinder problems Integrity of the scholarly record The three obsolescences

hardware file format software

11 April 2023 CC-BY-SA atreloar 26

System of Journals Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 27

Web of Objects Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 28

Not just citation relationships

11 April 2023 CC-BY-SA hvdsomp and atreloar 29

Your Text Here

The problem of obsolescence Lifescience research environment can be viewed as

undergoing a process of accelerated evolution Other disciplines will hit these problems in time

11 April 2023 CC-BY-SA atreloar 30

Cambrian explosion

11 April 2023 31

Hardware obsolescence Roche 454

11 April 2023 CC-BY-SA atreloar 32

Software obsolescence too much choice not enough support

11 April 2023 CC-BY-SA atreloar 33

Abandonware ldquoLast summer a member of the biology department of the

University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004

11 April 2023 CC-BY-SA atreloar 34

File format obsolescence Illumina Probability of error in basecalling encoded using ascii code

to reduce file size Meaning of the ascii code changed along the life cycle and

for data generated at different time points the quality might be encoded differently

ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip

11 April 2023 CC-BY-SA atreloar 35

Everett Rogers Diffusion of Innovation 1962

11 April 2023 CC-BY-SA atreloar 36

Conclusions Need to move to a smaller number of standard file

formats Need to move to a more sustainable model of

software development and maintenance Need to encourage platform manufacturers to

innovate around the hardware not the software NOTE other disciplines are looking to lifesciences

to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37

On best practices in the development of bioinformatics software Front Genet 02 Jul 14

Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories

available

11 April 2023 CC-BY-SA atreloar 38

Questions andrewtreloarandsorgau

atreloar

httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice

11 April 2023 CC-BY-SA atreloar 39

  • The life-sciences as a pathfinder in data-intensive research pr
  • Structure presentation
  • So many lifecycleshellip
  • Minimal Research Lifecycle
  • Sharing Scholarly Communication System and its Functions
  • System of Journals
  • Pointers to the future
  • Registration BioRxiv
  • Registration Github
  • Registration WikiPathways
  • Registration NeuroLex
  • Registration Nanopublications
  • Registration some observations
  • Certification PubMed Commons
  • Certification PubPeer
  • Certification Publons
  • Certification some observations
  • Awareness myExperiment
  • Awareness eLabNotebook RSS
  • Awareness Twitter
  • Awareness some observations
  • Archiving PDB
  • Archiving GenBank
  • Characterising the future
  • Fundamental changes
  • Pathfinder problems
  • System of Journals Archiving
  • Web of Objects Archiving
  • Not just citation relationships
  • The problem of obsolescence
  • Cambrian explosion
  • Hardware obsolescence Roche 454
  • Software obsolescence too much choice not enough support
  • Abandonware
  • File format obsolescence Illumina
  • Everett Rogers Diffusion of Innovation 1962
  • Conclusions
  • On best practices in the development of bioinformatics software
  • Questions
Page 15: The life-sciences as a pathfinder in data-intensive research practice

Certification PubPeer

11 April 2023 CC-BY-SA hvdsomp and atreloar 15

Certification Publons

11 April 2023 CC-BY-SA hvdsomp and atreloar 16

Certification some observations Peer-review decoupled from publication process Certification of various types of objects Machines validating form Social endorsement

11 April 2023 CC-BY-SA hvdsomp and atreloar 17

Awareness myExperiment

11 April 2023 CC-BY-SA hvdsomp and atreloar 18

Awareness eLabNotebook RSS

11 April 2023 CC-BY-SA hvdsomp and atreloar 19

Awareness Twitter

11 April 2023 CC-BY-SA hvdsomp and atreloar 20

Awareness some observations Awareness for various types of objects Real time awareness Awareness support targeted at machines Awareness through social media

11 April 2023 CC-BY-SA hvdsomp and atreloar 21

Archiving PDB

11 April 2023 CC-BY-SA hvdsomp and atreloar 22

Archiving GenBank

11 April 2023 CC-BY-SA hvdsomp and atreloar 23

Characterising the future

11 April 2023 CC-BY-SA hvdsomp and atreloar 24

Fundamental changes The research process (objects social

dimension) is becoming more exposed Articles books are no longer the only

relevant objects for research communication Objects are no longer static Machines are joining humans as

(co-)creators and consumers of research objects

11 April 2023 CC-BY-SA hvdsomp and atreloar 25

Pathfinder problems Integrity of the scholarly record The three obsolescences

hardware file format software

11 April 2023 CC-BY-SA atreloar 26

System of Journals Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 27

Web of Objects Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 28

Not just citation relationships

11 April 2023 CC-BY-SA hvdsomp and atreloar 29

Your Text Here

The problem of obsolescence Lifescience research environment can be viewed as

undergoing a process of accelerated evolution Other disciplines will hit these problems in time

11 April 2023 CC-BY-SA atreloar 30

Cambrian explosion

11 April 2023 31

Hardware obsolescence Roche 454

11 April 2023 CC-BY-SA atreloar 32

Software obsolescence too much choice not enough support

11 April 2023 CC-BY-SA atreloar 33

Abandonware ldquoLast summer a member of the biology department of the

University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004

11 April 2023 CC-BY-SA atreloar 34

File format obsolescence Illumina Probability of error in basecalling encoded using ascii code

to reduce file size Meaning of the ascii code changed along the life cycle and

for data generated at different time points the quality might be encoded differently

ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip

11 April 2023 CC-BY-SA atreloar 35

Everett Rogers Diffusion of Innovation 1962

11 April 2023 CC-BY-SA atreloar 36

Conclusions Need to move to a smaller number of standard file

formats Need to move to a more sustainable model of

software development and maintenance Need to encourage platform manufacturers to

innovate around the hardware not the software NOTE other disciplines are looking to lifesciences

to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37

On best practices in the development of bioinformatics software Front Genet 02 Jul 14

Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories

available

11 April 2023 CC-BY-SA atreloar 38

Questions andrewtreloarandsorgau

atreloar

httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice

11 April 2023 CC-BY-SA atreloar 39

  • The life-sciences as a pathfinder in data-intensive research pr
  • Structure presentation
  • So many lifecycleshellip
  • Minimal Research Lifecycle
  • Sharing Scholarly Communication System and its Functions
  • System of Journals
  • Pointers to the future
  • Registration BioRxiv
  • Registration Github
  • Registration WikiPathways
  • Registration NeuroLex
  • Registration Nanopublications
  • Registration some observations
  • Certification PubMed Commons
  • Certification PubPeer
  • Certification Publons
  • Certification some observations
  • Awareness myExperiment
  • Awareness eLabNotebook RSS
  • Awareness Twitter
  • Awareness some observations
  • Archiving PDB
  • Archiving GenBank
  • Characterising the future
  • Fundamental changes
  • Pathfinder problems
  • System of Journals Archiving
  • Web of Objects Archiving
  • Not just citation relationships
  • The problem of obsolescence
  • Cambrian explosion
  • Hardware obsolescence Roche 454
  • Software obsolescence too much choice not enough support
  • Abandonware
  • File format obsolescence Illumina
  • Everett Rogers Diffusion of Innovation 1962
  • Conclusions
  • On best practices in the development of bioinformatics software
  • Questions
Page 16: The life-sciences as a pathfinder in data-intensive research practice

Certification Publons

11 April 2023 CC-BY-SA hvdsomp and atreloar 16

Certification some observations Peer-review decoupled from publication process Certification of various types of objects Machines validating form Social endorsement

11 April 2023 CC-BY-SA hvdsomp and atreloar 17

Awareness myExperiment

11 April 2023 CC-BY-SA hvdsomp and atreloar 18

Awareness eLabNotebook RSS

11 April 2023 CC-BY-SA hvdsomp and atreloar 19

Awareness Twitter

11 April 2023 CC-BY-SA hvdsomp and atreloar 20

Awareness some observations Awareness for various types of objects Real time awareness Awareness support targeted at machines Awareness through social media

11 April 2023 CC-BY-SA hvdsomp and atreloar 21

Archiving PDB

11 April 2023 CC-BY-SA hvdsomp and atreloar 22

Archiving GenBank

11 April 2023 CC-BY-SA hvdsomp and atreloar 23

Characterising the future

11 April 2023 CC-BY-SA hvdsomp and atreloar 24

Fundamental changes The research process (objects social

dimension) is becoming more exposed Articles books are no longer the only

relevant objects for research communication Objects are no longer static Machines are joining humans as

(co-)creators and consumers of research objects

11 April 2023 CC-BY-SA hvdsomp and atreloar 25

Pathfinder problems Integrity of the scholarly record The three obsolescences

hardware file format software

11 April 2023 CC-BY-SA atreloar 26

System of Journals Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 27

Web of Objects Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 28

Not just citation relationships

11 April 2023 CC-BY-SA hvdsomp and atreloar 29

Your Text Here

The problem of obsolescence Lifescience research environment can be viewed as

undergoing a process of accelerated evolution Other disciplines will hit these problems in time

11 April 2023 CC-BY-SA atreloar 30

Cambrian explosion

11 April 2023 31

Hardware obsolescence Roche 454

11 April 2023 CC-BY-SA atreloar 32

Software obsolescence too much choice not enough support

11 April 2023 CC-BY-SA atreloar 33

Abandonware ldquoLast summer a member of the biology department of the

University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004

11 April 2023 CC-BY-SA atreloar 34

File format obsolescence Illumina Probability of error in basecalling encoded using ascii code

to reduce file size Meaning of the ascii code changed along the life cycle and

for data generated at different time points the quality might be encoded differently

ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip

11 April 2023 CC-BY-SA atreloar 35

Everett Rogers Diffusion of Innovation 1962

11 April 2023 CC-BY-SA atreloar 36

Conclusions Need to move to a smaller number of standard file

formats Need to move to a more sustainable model of

software development and maintenance Need to encourage platform manufacturers to

innovate around the hardware not the software NOTE other disciplines are looking to lifesciences

to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37

On best practices in the development of bioinformatics software Front Genet 02 Jul 14

Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories

available

11 April 2023 CC-BY-SA atreloar 38

Questions andrewtreloarandsorgau

atreloar

httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice

11 April 2023 CC-BY-SA atreloar 39

  • The life-sciences as a pathfinder in data-intensive research pr
  • Structure presentation
  • So many lifecycleshellip
  • Minimal Research Lifecycle
  • Sharing Scholarly Communication System and its Functions
  • System of Journals
  • Pointers to the future
  • Registration BioRxiv
  • Registration Github
  • Registration WikiPathways
  • Registration NeuroLex
  • Registration Nanopublications
  • Registration some observations
  • Certification PubMed Commons
  • Certification PubPeer
  • Certification Publons
  • Certification some observations
  • Awareness myExperiment
  • Awareness eLabNotebook RSS
  • Awareness Twitter
  • Awareness some observations
  • Archiving PDB
  • Archiving GenBank
  • Characterising the future
  • Fundamental changes
  • Pathfinder problems
  • System of Journals Archiving
  • Web of Objects Archiving
  • Not just citation relationships
  • The problem of obsolescence
  • Cambrian explosion
  • Hardware obsolescence Roche 454
  • Software obsolescence too much choice not enough support
  • Abandonware
  • File format obsolescence Illumina
  • Everett Rogers Diffusion of Innovation 1962
  • Conclusions
  • On best practices in the development of bioinformatics software
  • Questions
Page 17: The life-sciences as a pathfinder in data-intensive research practice

Certification some observations Peer-review decoupled from publication process Certification of various types of objects Machines validating form Social endorsement

11 April 2023 CC-BY-SA hvdsomp and atreloar 17

Awareness myExperiment

11 April 2023 CC-BY-SA hvdsomp and atreloar 18

Awareness eLabNotebook RSS

11 April 2023 CC-BY-SA hvdsomp and atreloar 19

Awareness Twitter

11 April 2023 CC-BY-SA hvdsomp and atreloar 20

Awareness some observations Awareness for various types of objects Real time awareness Awareness support targeted at machines Awareness through social media

11 April 2023 CC-BY-SA hvdsomp and atreloar 21

Archiving PDB

11 April 2023 CC-BY-SA hvdsomp and atreloar 22

Archiving GenBank

11 April 2023 CC-BY-SA hvdsomp and atreloar 23

Characterising the future

11 April 2023 CC-BY-SA hvdsomp and atreloar 24

Fundamental changes The research process (objects social

dimension) is becoming more exposed Articles books are no longer the only

relevant objects for research communication Objects are no longer static Machines are joining humans as

(co-)creators and consumers of research objects

11 April 2023 CC-BY-SA hvdsomp and atreloar 25

Pathfinder problems Integrity of the scholarly record The three obsolescences

hardware file format software

11 April 2023 CC-BY-SA atreloar 26

System of Journals Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 27

Web of Objects Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 28

Not just citation relationships

11 April 2023 CC-BY-SA hvdsomp and atreloar 29

Your Text Here

The problem of obsolescence Lifescience research environment can be viewed as

undergoing a process of accelerated evolution Other disciplines will hit these problems in time

11 April 2023 CC-BY-SA atreloar 30

Cambrian explosion

11 April 2023 31

Hardware obsolescence Roche 454

11 April 2023 CC-BY-SA atreloar 32

Software obsolescence too much choice not enough support

11 April 2023 CC-BY-SA atreloar 33

Abandonware ldquoLast summer a member of the biology department of the

University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004

11 April 2023 CC-BY-SA atreloar 34

File format obsolescence Illumina Probability of error in basecalling encoded using ascii code

to reduce file size Meaning of the ascii code changed along the life cycle and

for data generated at different time points the quality might be encoded differently

ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip

11 April 2023 CC-BY-SA atreloar 35

Everett Rogers Diffusion of Innovation 1962

11 April 2023 CC-BY-SA atreloar 36

Conclusions Need to move to a smaller number of standard file

formats Need to move to a more sustainable model of

software development and maintenance Need to encourage platform manufacturers to

innovate around the hardware not the software NOTE other disciplines are looking to lifesciences

to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37

On best practices in the development of bioinformatics software Front Genet 02 Jul 14

Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories

available

11 April 2023 CC-BY-SA atreloar 38

Questions andrewtreloarandsorgau

atreloar

httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice

11 April 2023 CC-BY-SA atreloar 39

  • The life-sciences as a pathfinder in data-intensive research pr
  • Structure presentation
  • So many lifecycleshellip
  • Minimal Research Lifecycle
  • Sharing Scholarly Communication System and its Functions
  • System of Journals
  • Pointers to the future
  • Registration BioRxiv
  • Registration Github
  • Registration WikiPathways
  • Registration NeuroLex
  • Registration Nanopublications
  • Registration some observations
  • Certification PubMed Commons
  • Certification PubPeer
  • Certification Publons
  • Certification some observations
  • Awareness myExperiment
  • Awareness eLabNotebook RSS
  • Awareness Twitter
  • Awareness some observations
  • Archiving PDB
  • Archiving GenBank
  • Characterising the future
  • Fundamental changes
  • Pathfinder problems
  • System of Journals Archiving
  • Web of Objects Archiving
  • Not just citation relationships
  • The problem of obsolescence
  • Cambrian explosion
  • Hardware obsolescence Roche 454
  • Software obsolescence too much choice not enough support
  • Abandonware
  • File format obsolescence Illumina
  • Everett Rogers Diffusion of Innovation 1962
  • Conclusions
  • On best practices in the development of bioinformatics software
  • Questions
Page 18: The life-sciences as a pathfinder in data-intensive research practice

Awareness myExperiment

11 April 2023 CC-BY-SA hvdsomp and atreloar 18

Awareness eLabNotebook RSS

11 April 2023 CC-BY-SA hvdsomp and atreloar 19

Awareness Twitter

11 April 2023 CC-BY-SA hvdsomp and atreloar 20

Awareness some observations Awareness for various types of objects Real time awareness Awareness support targeted at machines Awareness through social media

11 April 2023 CC-BY-SA hvdsomp and atreloar 21

Archiving PDB

11 April 2023 CC-BY-SA hvdsomp and atreloar 22

Archiving GenBank

11 April 2023 CC-BY-SA hvdsomp and atreloar 23

Characterising the future

11 April 2023 CC-BY-SA hvdsomp and atreloar 24

Fundamental changes The research process (objects social

dimension) is becoming more exposed Articles books are no longer the only

relevant objects for research communication Objects are no longer static Machines are joining humans as

(co-)creators and consumers of research objects

11 April 2023 CC-BY-SA hvdsomp and atreloar 25

Pathfinder problems Integrity of the scholarly record The three obsolescences

hardware file format software

11 April 2023 CC-BY-SA atreloar 26

System of Journals Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 27

Web of Objects Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 28

Not just citation relationships

11 April 2023 CC-BY-SA hvdsomp and atreloar 29

Your Text Here

The problem of obsolescence Lifescience research environment can be viewed as

undergoing a process of accelerated evolution Other disciplines will hit these problems in time

11 April 2023 CC-BY-SA atreloar 30

Cambrian explosion

11 April 2023 31

Hardware obsolescence Roche 454

11 April 2023 CC-BY-SA atreloar 32

Software obsolescence too much choice not enough support

11 April 2023 CC-BY-SA atreloar 33

Abandonware ldquoLast summer a member of the biology department of the

University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004

11 April 2023 CC-BY-SA atreloar 34

File format obsolescence Illumina Probability of error in basecalling encoded using ascii code

to reduce file size Meaning of the ascii code changed along the life cycle and

for data generated at different time points the quality might be encoded differently

ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip

11 April 2023 CC-BY-SA atreloar 35

Everett Rogers Diffusion of Innovation 1962

11 April 2023 CC-BY-SA atreloar 36

Conclusions Need to move to a smaller number of standard file

formats Need to move to a more sustainable model of

software development and maintenance Need to encourage platform manufacturers to

innovate around the hardware not the software NOTE other disciplines are looking to lifesciences

to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37

On best practices in the development of bioinformatics software Front Genet 02 Jul 14

Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories

available

11 April 2023 CC-BY-SA atreloar 38

Questions andrewtreloarandsorgau

atreloar

httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice

11 April 2023 CC-BY-SA atreloar 39

  • The life-sciences as a pathfinder in data-intensive research pr
  • Structure presentation
  • So many lifecycleshellip
  • Minimal Research Lifecycle
  • Sharing Scholarly Communication System and its Functions
  • System of Journals
  • Pointers to the future
  • Registration BioRxiv
  • Registration Github
  • Registration WikiPathways
  • Registration NeuroLex
  • Registration Nanopublications
  • Registration some observations
  • Certification PubMed Commons
  • Certification PubPeer
  • Certification Publons
  • Certification some observations
  • Awareness myExperiment
  • Awareness eLabNotebook RSS
  • Awareness Twitter
  • Awareness some observations
  • Archiving PDB
  • Archiving GenBank
  • Characterising the future
  • Fundamental changes
  • Pathfinder problems
  • System of Journals Archiving
  • Web of Objects Archiving
  • Not just citation relationships
  • The problem of obsolescence
  • Cambrian explosion
  • Hardware obsolescence Roche 454
  • Software obsolescence too much choice not enough support
  • Abandonware
  • File format obsolescence Illumina
  • Everett Rogers Diffusion of Innovation 1962
  • Conclusions
  • On best practices in the development of bioinformatics software
  • Questions
Page 19: The life-sciences as a pathfinder in data-intensive research practice

Awareness eLabNotebook RSS

11 April 2023 CC-BY-SA hvdsomp and atreloar 19

Awareness Twitter

11 April 2023 CC-BY-SA hvdsomp and atreloar 20

Awareness some observations Awareness for various types of objects Real time awareness Awareness support targeted at machines Awareness through social media

11 April 2023 CC-BY-SA hvdsomp and atreloar 21

Archiving PDB

11 April 2023 CC-BY-SA hvdsomp and atreloar 22

Archiving GenBank

11 April 2023 CC-BY-SA hvdsomp and atreloar 23

Characterising the future

11 April 2023 CC-BY-SA hvdsomp and atreloar 24

Fundamental changes The research process (objects social

dimension) is becoming more exposed Articles books are no longer the only

relevant objects for research communication Objects are no longer static Machines are joining humans as

(co-)creators and consumers of research objects

11 April 2023 CC-BY-SA hvdsomp and atreloar 25

Pathfinder problems Integrity of the scholarly record The three obsolescences

hardware file format software

11 April 2023 CC-BY-SA atreloar 26

System of Journals Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 27

Web of Objects Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 28

Not just citation relationships

11 April 2023 CC-BY-SA hvdsomp and atreloar 29

Your Text Here

The problem of obsolescence Lifescience research environment can be viewed as

undergoing a process of accelerated evolution Other disciplines will hit these problems in time

11 April 2023 CC-BY-SA atreloar 30

Cambrian explosion

11 April 2023 31

Hardware obsolescence Roche 454

11 April 2023 CC-BY-SA atreloar 32

Software obsolescence too much choice not enough support

11 April 2023 CC-BY-SA atreloar 33

Abandonware ldquoLast summer a member of the biology department of the

University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004

11 April 2023 CC-BY-SA atreloar 34

File format obsolescence Illumina Probability of error in basecalling encoded using ascii code

to reduce file size Meaning of the ascii code changed along the life cycle and

for data generated at different time points the quality might be encoded differently

ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip

11 April 2023 CC-BY-SA atreloar 35

Everett Rogers Diffusion of Innovation 1962

11 April 2023 CC-BY-SA atreloar 36

Conclusions Need to move to a smaller number of standard file

formats Need to move to a more sustainable model of

software development and maintenance Need to encourage platform manufacturers to

innovate around the hardware not the software NOTE other disciplines are looking to lifesciences

to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37

On best practices in the development of bioinformatics software Front Genet 02 Jul 14

Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories

available

11 April 2023 CC-BY-SA atreloar 38

Questions andrewtreloarandsorgau

atreloar

httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice

11 April 2023 CC-BY-SA atreloar 39

  • The life-sciences as a pathfinder in data-intensive research pr
  • Structure presentation
  • So many lifecycleshellip
  • Minimal Research Lifecycle
  • Sharing Scholarly Communication System and its Functions
  • System of Journals
  • Pointers to the future
  • Registration BioRxiv
  • Registration Github
  • Registration WikiPathways
  • Registration NeuroLex
  • Registration Nanopublications
  • Registration some observations
  • Certification PubMed Commons
  • Certification PubPeer
  • Certification Publons
  • Certification some observations
  • Awareness myExperiment
  • Awareness eLabNotebook RSS
  • Awareness Twitter
  • Awareness some observations
  • Archiving PDB
  • Archiving GenBank
  • Characterising the future
  • Fundamental changes
  • Pathfinder problems
  • System of Journals Archiving
  • Web of Objects Archiving
  • Not just citation relationships
  • The problem of obsolescence
  • Cambrian explosion
  • Hardware obsolescence Roche 454
  • Software obsolescence too much choice not enough support
  • Abandonware
  • File format obsolescence Illumina
  • Everett Rogers Diffusion of Innovation 1962
  • Conclusions
  • On best practices in the development of bioinformatics software
  • Questions
Page 20: The life-sciences as a pathfinder in data-intensive research practice

Awareness Twitter

11 April 2023 CC-BY-SA hvdsomp and atreloar 20

Awareness some observations Awareness for various types of objects Real time awareness Awareness support targeted at machines Awareness through social media

11 April 2023 CC-BY-SA hvdsomp and atreloar 21

Archiving PDB

11 April 2023 CC-BY-SA hvdsomp and atreloar 22

Archiving GenBank

11 April 2023 CC-BY-SA hvdsomp and atreloar 23

Characterising the future

11 April 2023 CC-BY-SA hvdsomp and atreloar 24

Fundamental changes The research process (objects social

dimension) is becoming more exposed Articles books are no longer the only

relevant objects for research communication Objects are no longer static Machines are joining humans as

(co-)creators and consumers of research objects

11 April 2023 CC-BY-SA hvdsomp and atreloar 25

Pathfinder problems Integrity of the scholarly record The three obsolescences

hardware file format software

11 April 2023 CC-BY-SA atreloar 26

System of Journals Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 27

Web of Objects Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 28

Not just citation relationships

11 April 2023 CC-BY-SA hvdsomp and atreloar 29

Your Text Here

The problem of obsolescence Lifescience research environment can be viewed as

undergoing a process of accelerated evolution Other disciplines will hit these problems in time

11 April 2023 CC-BY-SA atreloar 30

Cambrian explosion

11 April 2023 31

Hardware obsolescence Roche 454

11 April 2023 CC-BY-SA atreloar 32

Software obsolescence too much choice not enough support

11 April 2023 CC-BY-SA atreloar 33

Abandonware ldquoLast summer a member of the biology department of the

University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004

11 April 2023 CC-BY-SA atreloar 34

File format obsolescence Illumina Probability of error in basecalling encoded using ascii code

to reduce file size Meaning of the ascii code changed along the life cycle and

for data generated at different time points the quality might be encoded differently

ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip

11 April 2023 CC-BY-SA atreloar 35

Everett Rogers Diffusion of Innovation 1962

11 April 2023 CC-BY-SA atreloar 36

Conclusions Need to move to a smaller number of standard file

formats Need to move to a more sustainable model of

software development and maintenance Need to encourage platform manufacturers to

innovate around the hardware not the software NOTE other disciplines are looking to lifesciences

to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37

On best practices in the development of bioinformatics software Front Genet 02 Jul 14

Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories

available

11 April 2023 CC-BY-SA atreloar 38

Questions andrewtreloarandsorgau

atreloar

httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice

11 April 2023 CC-BY-SA atreloar 39

  • The life-sciences as a pathfinder in data-intensive research pr
  • Structure presentation
  • So many lifecycleshellip
  • Minimal Research Lifecycle
  • Sharing Scholarly Communication System and its Functions
  • System of Journals
  • Pointers to the future
  • Registration BioRxiv
  • Registration Github
  • Registration WikiPathways
  • Registration NeuroLex
  • Registration Nanopublications
  • Registration some observations
  • Certification PubMed Commons
  • Certification PubPeer
  • Certification Publons
  • Certification some observations
  • Awareness myExperiment
  • Awareness eLabNotebook RSS
  • Awareness Twitter
  • Awareness some observations
  • Archiving PDB
  • Archiving GenBank
  • Characterising the future
  • Fundamental changes
  • Pathfinder problems
  • System of Journals Archiving
  • Web of Objects Archiving
  • Not just citation relationships
  • The problem of obsolescence
  • Cambrian explosion
  • Hardware obsolescence Roche 454
  • Software obsolescence too much choice not enough support
  • Abandonware
  • File format obsolescence Illumina
  • Everett Rogers Diffusion of Innovation 1962
  • Conclusions
  • On best practices in the development of bioinformatics software
  • Questions
Page 21: The life-sciences as a pathfinder in data-intensive research practice

Awareness some observations Awareness for various types of objects Real time awareness Awareness support targeted at machines Awareness through social media

11 April 2023 CC-BY-SA hvdsomp and atreloar 21

Archiving PDB

11 April 2023 CC-BY-SA hvdsomp and atreloar 22

Archiving GenBank

11 April 2023 CC-BY-SA hvdsomp and atreloar 23

Characterising the future

11 April 2023 CC-BY-SA hvdsomp and atreloar 24

Fundamental changes The research process (objects social

dimension) is becoming more exposed Articles books are no longer the only

relevant objects for research communication Objects are no longer static Machines are joining humans as

(co-)creators and consumers of research objects

11 April 2023 CC-BY-SA hvdsomp and atreloar 25

Pathfinder problems Integrity of the scholarly record The three obsolescences

hardware file format software

11 April 2023 CC-BY-SA atreloar 26

System of Journals Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 27

Web of Objects Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 28

Not just citation relationships

11 April 2023 CC-BY-SA hvdsomp and atreloar 29

Your Text Here

The problem of obsolescence Lifescience research environment can be viewed as

undergoing a process of accelerated evolution Other disciplines will hit these problems in time

11 April 2023 CC-BY-SA atreloar 30

Cambrian explosion

11 April 2023 31

Hardware obsolescence Roche 454

11 April 2023 CC-BY-SA atreloar 32

Software obsolescence too much choice not enough support

11 April 2023 CC-BY-SA atreloar 33

Abandonware ldquoLast summer a member of the biology department of the

University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004

11 April 2023 CC-BY-SA atreloar 34

File format obsolescence Illumina Probability of error in basecalling encoded using ascii code

to reduce file size Meaning of the ascii code changed along the life cycle and

for data generated at different time points the quality might be encoded differently

ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip

11 April 2023 CC-BY-SA atreloar 35

Everett Rogers Diffusion of Innovation 1962

11 April 2023 CC-BY-SA atreloar 36

Conclusions Need to move to a smaller number of standard file

formats Need to move to a more sustainable model of

software development and maintenance Need to encourage platform manufacturers to

innovate around the hardware not the software NOTE other disciplines are looking to lifesciences

to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37

On best practices in the development of bioinformatics software Front Genet 02 Jul 14

Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories

available

11 April 2023 CC-BY-SA atreloar 38

Questions andrewtreloarandsorgau

atreloar

httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice

11 April 2023 CC-BY-SA atreloar 39

  • The life-sciences as a pathfinder in data-intensive research pr
  • Structure presentation
  • So many lifecycleshellip
  • Minimal Research Lifecycle
  • Sharing Scholarly Communication System and its Functions
  • System of Journals
  • Pointers to the future
  • Registration BioRxiv
  • Registration Github
  • Registration WikiPathways
  • Registration NeuroLex
  • Registration Nanopublications
  • Registration some observations
  • Certification PubMed Commons
  • Certification PubPeer
  • Certification Publons
  • Certification some observations
  • Awareness myExperiment
  • Awareness eLabNotebook RSS
  • Awareness Twitter
  • Awareness some observations
  • Archiving PDB
  • Archiving GenBank
  • Characterising the future
  • Fundamental changes
  • Pathfinder problems
  • System of Journals Archiving
  • Web of Objects Archiving
  • Not just citation relationships
  • The problem of obsolescence
  • Cambrian explosion
  • Hardware obsolescence Roche 454
  • Software obsolescence too much choice not enough support
  • Abandonware
  • File format obsolescence Illumina
  • Everett Rogers Diffusion of Innovation 1962
  • Conclusions
  • On best practices in the development of bioinformatics software
  • Questions
Page 22: The life-sciences as a pathfinder in data-intensive research practice

Archiving PDB

11 April 2023 CC-BY-SA hvdsomp and atreloar 22

Archiving GenBank

11 April 2023 CC-BY-SA hvdsomp and atreloar 23

Characterising the future

11 April 2023 CC-BY-SA hvdsomp and atreloar 24

Fundamental changes The research process (objects social

dimension) is becoming more exposed Articles books are no longer the only

relevant objects for research communication Objects are no longer static Machines are joining humans as

(co-)creators and consumers of research objects

11 April 2023 CC-BY-SA hvdsomp and atreloar 25

Pathfinder problems Integrity of the scholarly record The three obsolescences

hardware file format software

11 April 2023 CC-BY-SA atreloar 26

System of Journals Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 27

Web of Objects Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 28

Not just citation relationships

11 April 2023 CC-BY-SA hvdsomp and atreloar 29

Your Text Here

The problem of obsolescence Lifescience research environment can be viewed as

undergoing a process of accelerated evolution Other disciplines will hit these problems in time

11 April 2023 CC-BY-SA atreloar 30

Cambrian explosion

11 April 2023 31

Hardware obsolescence Roche 454

11 April 2023 CC-BY-SA atreloar 32

Software obsolescence too much choice not enough support

11 April 2023 CC-BY-SA atreloar 33

Abandonware ldquoLast summer a member of the biology department of the

University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004

11 April 2023 CC-BY-SA atreloar 34

File format obsolescence Illumina Probability of error in basecalling encoded using ascii code

to reduce file size Meaning of the ascii code changed along the life cycle and

for data generated at different time points the quality might be encoded differently

ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip

11 April 2023 CC-BY-SA atreloar 35

Everett Rogers Diffusion of Innovation 1962

11 April 2023 CC-BY-SA atreloar 36

Conclusions Need to move to a smaller number of standard file

formats Need to move to a more sustainable model of

software development and maintenance Need to encourage platform manufacturers to

innovate around the hardware not the software NOTE other disciplines are looking to lifesciences

to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37

On best practices in the development of bioinformatics software Front Genet 02 Jul 14

Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories

available

11 April 2023 CC-BY-SA atreloar 38

Questions andrewtreloarandsorgau

atreloar

httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice

11 April 2023 CC-BY-SA atreloar 39

  • The life-sciences as a pathfinder in data-intensive research pr
  • Structure presentation
  • So many lifecycleshellip
  • Minimal Research Lifecycle
  • Sharing Scholarly Communication System and its Functions
  • System of Journals
  • Pointers to the future
  • Registration BioRxiv
  • Registration Github
  • Registration WikiPathways
  • Registration NeuroLex
  • Registration Nanopublications
  • Registration some observations
  • Certification PubMed Commons
  • Certification PubPeer
  • Certification Publons
  • Certification some observations
  • Awareness myExperiment
  • Awareness eLabNotebook RSS
  • Awareness Twitter
  • Awareness some observations
  • Archiving PDB
  • Archiving GenBank
  • Characterising the future
  • Fundamental changes
  • Pathfinder problems
  • System of Journals Archiving
  • Web of Objects Archiving
  • Not just citation relationships
  • The problem of obsolescence
  • Cambrian explosion
  • Hardware obsolescence Roche 454
  • Software obsolescence too much choice not enough support
  • Abandonware
  • File format obsolescence Illumina
  • Everett Rogers Diffusion of Innovation 1962
  • Conclusions
  • On best practices in the development of bioinformatics software
  • Questions
Page 23: The life-sciences as a pathfinder in data-intensive research practice

Archiving GenBank

11 April 2023 CC-BY-SA hvdsomp and atreloar 23

Characterising the future

11 April 2023 CC-BY-SA hvdsomp and atreloar 24

Fundamental changes The research process (objects social

dimension) is becoming more exposed Articles books are no longer the only

relevant objects for research communication Objects are no longer static Machines are joining humans as

(co-)creators and consumers of research objects

11 April 2023 CC-BY-SA hvdsomp and atreloar 25

Pathfinder problems Integrity of the scholarly record The three obsolescences

hardware file format software

11 April 2023 CC-BY-SA atreloar 26

System of Journals Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 27

Web of Objects Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 28

Not just citation relationships

11 April 2023 CC-BY-SA hvdsomp and atreloar 29

Your Text Here

The problem of obsolescence Lifescience research environment can be viewed as

undergoing a process of accelerated evolution Other disciplines will hit these problems in time

11 April 2023 CC-BY-SA atreloar 30

Cambrian explosion

11 April 2023 31

Hardware obsolescence Roche 454

11 April 2023 CC-BY-SA atreloar 32

Software obsolescence too much choice not enough support

11 April 2023 CC-BY-SA atreloar 33

Abandonware ldquoLast summer a member of the biology department of the

University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004

11 April 2023 CC-BY-SA atreloar 34

File format obsolescence Illumina Probability of error in basecalling encoded using ascii code

to reduce file size Meaning of the ascii code changed along the life cycle and

for data generated at different time points the quality might be encoded differently

ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip

11 April 2023 CC-BY-SA atreloar 35

Everett Rogers Diffusion of Innovation 1962

11 April 2023 CC-BY-SA atreloar 36

Conclusions Need to move to a smaller number of standard file

formats Need to move to a more sustainable model of

software development and maintenance Need to encourage platform manufacturers to

innovate around the hardware not the software NOTE other disciplines are looking to lifesciences

to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37

On best practices in the development of bioinformatics software Front Genet 02 Jul 14

Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories

available

11 April 2023 CC-BY-SA atreloar 38

Questions andrewtreloarandsorgau

atreloar

httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice

11 April 2023 CC-BY-SA atreloar 39

  • The life-sciences as a pathfinder in data-intensive research pr
  • Structure presentation
  • So many lifecycleshellip
  • Minimal Research Lifecycle
  • Sharing Scholarly Communication System and its Functions
  • System of Journals
  • Pointers to the future
  • Registration BioRxiv
  • Registration Github
  • Registration WikiPathways
  • Registration NeuroLex
  • Registration Nanopublications
  • Registration some observations
  • Certification PubMed Commons
  • Certification PubPeer
  • Certification Publons
  • Certification some observations
  • Awareness myExperiment
  • Awareness eLabNotebook RSS
  • Awareness Twitter
  • Awareness some observations
  • Archiving PDB
  • Archiving GenBank
  • Characterising the future
  • Fundamental changes
  • Pathfinder problems
  • System of Journals Archiving
  • Web of Objects Archiving
  • Not just citation relationships
  • The problem of obsolescence
  • Cambrian explosion
  • Hardware obsolescence Roche 454
  • Software obsolescence too much choice not enough support
  • Abandonware
  • File format obsolescence Illumina
  • Everett Rogers Diffusion of Innovation 1962
  • Conclusions
  • On best practices in the development of bioinformatics software
  • Questions
Page 24: The life-sciences as a pathfinder in data-intensive research practice

Characterising the future

11 April 2023 CC-BY-SA hvdsomp and atreloar 24

Fundamental changes The research process (objects social

dimension) is becoming more exposed Articles books are no longer the only

relevant objects for research communication Objects are no longer static Machines are joining humans as

(co-)creators and consumers of research objects

11 April 2023 CC-BY-SA hvdsomp and atreloar 25

Pathfinder problems Integrity of the scholarly record The three obsolescences

hardware file format software

11 April 2023 CC-BY-SA atreloar 26

System of Journals Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 27

Web of Objects Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 28

Not just citation relationships

11 April 2023 CC-BY-SA hvdsomp and atreloar 29

Your Text Here

The problem of obsolescence Lifescience research environment can be viewed as

undergoing a process of accelerated evolution Other disciplines will hit these problems in time

11 April 2023 CC-BY-SA atreloar 30

Cambrian explosion

11 April 2023 31

Hardware obsolescence Roche 454

11 April 2023 CC-BY-SA atreloar 32

Software obsolescence too much choice not enough support

11 April 2023 CC-BY-SA atreloar 33

Abandonware ldquoLast summer a member of the biology department of the

University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004

11 April 2023 CC-BY-SA atreloar 34

File format obsolescence Illumina Probability of error in basecalling encoded using ascii code

to reduce file size Meaning of the ascii code changed along the life cycle and

for data generated at different time points the quality might be encoded differently

ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip

11 April 2023 CC-BY-SA atreloar 35

Everett Rogers Diffusion of Innovation 1962

11 April 2023 CC-BY-SA atreloar 36

Conclusions Need to move to a smaller number of standard file

formats Need to move to a more sustainable model of

software development and maintenance Need to encourage platform manufacturers to

innovate around the hardware not the software NOTE other disciplines are looking to lifesciences

to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37

On best practices in the development of bioinformatics software Front Genet 02 Jul 14

Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories

available

11 April 2023 CC-BY-SA atreloar 38

Questions andrewtreloarandsorgau

atreloar

httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice

11 April 2023 CC-BY-SA atreloar 39

  • The life-sciences as a pathfinder in data-intensive research pr
  • Structure presentation
  • So many lifecycleshellip
  • Minimal Research Lifecycle
  • Sharing Scholarly Communication System and its Functions
  • System of Journals
  • Pointers to the future
  • Registration BioRxiv
  • Registration Github
  • Registration WikiPathways
  • Registration NeuroLex
  • Registration Nanopublications
  • Registration some observations
  • Certification PubMed Commons
  • Certification PubPeer
  • Certification Publons
  • Certification some observations
  • Awareness myExperiment
  • Awareness eLabNotebook RSS
  • Awareness Twitter
  • Awareness some observations
  • Archiving PDB
  • Archiving GenBank
  • Characterising the future
  • Fundamental changes
  • Pathfinder problems
  • System of Journals Archiving
  • Web of Objects Archiving
  • Not just citation relationships
  • The problem of obsolescence
  • Cambrian explosion
  • Hardware obsolescence Roche 454
  • Software obsolescence too much choice not enough support
  • Abandonware
  • File format obsolescence Illumina
  • Everett Rogers Diffusion of Innovation 1962
  • Conclusions
  • On best practices in the development of bioinformatics software
  • Questions
Page 25: The life-sciences as a pathfinder in data-intensive research practice

Fundamental changes The research process (objects social

dimension) is becoming more exposed Articles books are no longer the only

relevant objects for research communication Objects are no longer static Machines are joining humans as

(co-)creators and consumers of research objects

11 April 2023 CC-BY-SA hvdsomp and atreloar 25

Pathfinder problems Integrity of the scholarly record The three obsolescences

hardware file format software

11 April 2023 CC-BY-SA atreloar 26

System of Journals Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 27

Web of Objects Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 28

Not just citation relationships

11 April 2023 CC-BY-SA hvdsomp and atreloar 29

Your Text Here

The problem of obsolescence Lifescience research environment can be viewed as

undergoing a process of accelerated evolution Other disciplines will hit these problems in time

11 April 2023 CC-BY-SA atreloar 30

Cambrian explosion

11 April 2023 31

Hardware obsolescence Roche 454

11 April 2023 CC-BY-SA atreloar 32

Software obsolescence too much choice not enough support

11 April 2023 CC-BY-SA atreloar 33

Abandonware ldquoLast summer a member of the biology department of the

University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004

11 April 2023 CC-BY-SA atreloar 34

File format obsolescence Illumina Probability of error in basecalling encoded using ascii code

to reduce file size Meaning of the ascii code changed along the life cycle and

for data generated at different time points the quality might be encoded differently

ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip

11 April 2023 CC-BY-SA atreloar 35

Everett Rogers Diffusion of Innovation 1962

11 April 2023 CC-BY-SA atreloar 36

Conclusions Need to move to a smaller number of standard file

formats Need to move to a more sustainable model of

software development and maintenance Need to encourage platform manufacturers to

innovate around the hardware not the software NOTE other disciplines are looking to lifesciences

to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37

On best practices in the development of bioinformatics software Front Genet 02 Jul 14

Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories

available

11 April 2023 CC-BY-SA atreloar 38

Questions andrewtreloarandsorgau

atreloar

httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice

11 April 2023 CC-BY-SA atreloar 39

  • The life-sciences as a pathfinder in data-intensive research pr
  • Structure presentation
  • So many lifecycleshellip
  • Minimal Research Lifecycle
  • Sharing Scholarly Communication System and its Functions
  • System of Journals
  • Pointers to the future
  • Registration BioRxiv
  • Registration Github
  • Registration WikiPathways
  • Registration NeuroLex
  • Registration Nanopublications
  • Registration some observations
  • Certification PubMed Commons
  • Certification PubPeer
  • Certification Publons
  • Certification some observations
  • Awareness myExperiment
  • Awareness eLabNotebook RSS
  • Awareness Twitter
  • Awareness some observations
  • Archiving PDB
  • Archiving GenBank
  • Characterising the future
  • Fundamental changes
  • Pathfinder problems
  • System of Journals Archiving
  • Web of Objects Archiving
  • Not just citation relationships
  • The problem of obsolescence
  • Cambrian explosion
  • Hardware obsolescence Roche 454
  • Software obsolescence too much choice not enough support
  • Abandonware
  • File format obsolescence Illumina
  • Everett Rogers Diffusion of Innovation 1962
  • Conclusions
  • On best practices in the development of bioinformatics software
  • Questions
Page 26: The life-sciences as a pathfinder in data-intensive research practice

Pathfinder problems Integrity of the scholarly record The three obsolescences

hardware file format software

11 April 2023 CC-BY-SA atreloar 26

System of Journals Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 27

Web of Objects Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 28

Not just citation relationships

11 April 2023 CC-BY-SA hvdsomp and atreloar 29

Your Text Here

The problem of obsolescence Lifescience research environment can be viewed as

undergoing a process of accelerated evolution Other disciplines will hit these problems in time

11 April 2023 CC-BY-SA atreloar 30

Cambrian explosion

11 April 2023 31

Hardware obsolescence Roche 454

11 April 2023 CC-BY-SA atreloar 32

Software obsolescence too much choice not enough support

11 April 2023 CC-BY-SA atreloar 33

Abandonware ldquoLast summer a member of the biology department of the

University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004

11 April 2023 CC-BY-SA atreloar 34

File format obsolescence Illumina Probability of error in basecalling encoded using ascii code

to reduce file size Meaning of the ascii code changed along the life cycle and

for data generated at different time points the quality might be encoded differently

ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip

11 April 2023 CC-BY-SA atreloar 35

Everett Rogers Diffusion of Innovation 1962

11 April 2023 CC-BY-SA atreloar 36

Conclusions Need to move to a smaller number of standard file

formats Need to move to a more sustainable model of

software development and maintenance Need to encourage platform manufacturers to

innovate around the hardware not the software NOTE other disciplines are looking to lifesciences

to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37

On best practices in the development of bioinformatics software Front Genet 02 Jul 14

Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories

available

11 April 2023 CC-BY-SA atreloar 38

Questions andrewtreloarandsorgau

atreloar

httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice

11 April 2023 CC-BY-SA atreloar 39

  • The life-sciences as a pathfinder in data-intensive research pr
  • Structure presentation
  • So many lifecycleshellip
  • Minimal Research Lifecycle
  • Sharing Scholarly Communication System and its Functions
  • System of Journals
  • Pointers to the future
  • Registration BioRxiv
  • Registration Github
  • Registration WikiPathways
  • Registration NeuroLex
  • Registration Nanopublications
  • Registration some observations
  • Certification PubMed Commons
  • Certification PubPeer
  • Certification Publons
  • Certification some observations
  • Awareness myExperiment
  • Awareness eLabNotebook RSS
  • Awareness Twitter
  • Awareness some observations
  • Archiving PDB
  • Archiving GenBank
  • Characterising the future
  • Fundamental changes
  • Pathfinder problems
  • System of Journals Archiving
  • Web of Objects Archiving
  • Not just citation relationships
  • The problem of obsolescence
  • Cambrian explosion
  • Hardware obsolescence Roche 454
  • Software obsolescence too much choice not enough support
  • Abandonware
  • File format obsolescence Illumina
  • Everett Rogers Diffusion of Innovation 1962
  • Conclusions
  • On best practices in the development of bioinformatics software
  • Questions
Page 27: The life-sciences as a pathfinder in data-intensive research practice

System of Journals Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 27

Web of Objects Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 28

Not just citation relationships

11 April 2023 CC-BY-SA hvdsomp and atreloar 29

Your Text Here

The problem of obsolescence Lifescience research environment can be viewed as

undergoing a process of accelerated evolution Other disciplines will hit these problems in time

11 April 2023 CC-BY-SA atreloar 30

Cambrian explosion

11 April 2023 31

Hardware obsolescence Roche 454

11 April 2023 CC-BY-SA atreloar 32

Software obsolescence too much choice not enough support

11 April 2023 CC-BY-SA atreloar 33

Abandonware ldquoLast summer a member of the biology department of the

University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004

11 April 2023 CC-BY-SA atreloar 34

File format obsolescence Illumina Probability of error in basecalling encoded using ascii code

to reduce file size Meaning of the ascii code changed along the life cycle and

for data generated at different time points the quality might be encoded differently

ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip

11 April 2023 CC-BY-SA atreloar 35

Everett Rogers Diffusion of Innovation 1962

11 April 2023 CC-BY-SA atreloar 36

Conclusions Need to move to a smaller number of standard file

formats Need to move to a more sustainable model of

software development and maintenance Need to encourage platform manufacturers to

innovate around the hardware not the software NOTE other disciplines are looking to lifesciences

to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37

On best practices in the development of bioinformatics software Front Genet 02 Jul 14

Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories

available

11 April 2023 CC-BY-SA atreloar 38

Questions andrewtreloarandsorgau

atreloar

httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice

11 April 2023 CC-BY-SA atreloar 39

  • The life-sciences as a pathfinder in data-intensive research pr
  • Structure presentation
  • So many lifecycleshellip
  • Minimal Research Lifecycle
  • Sharing Scholarly Communication System and its Functions
  • System of Journals
  • Pointers to the future
  • Registration BioRxiv
  • Registration Github
  • Registration WikiPathways
  • Registration NeuroLex
  • Registration Nanopublications
  • Registration some observations
  • Certification PubMed Commons
  • Certification PubPeer
  • Certification Publons
  • Certification some observations
  • Awareness myExperiment
  • Awareness eLabNotebook RSS
  • Awareness Twitter
  • Awareness some observations
  • Archiving PDB
  • Archiving GenBank
  • Characterising the future
  • Fundamental changes
  • Pathfinder problems
  • System of Journals Archiving
  • Web of Objects Archiving
  • Not just citation relationships
  • The problem of obsolescence
  • Cambrian explosion
  • Hardware obsolescence Roche 454
  • Software obsolescence too much choice not enough support
  • Abandonware
  • File format obsolescence Illumina
  • Everett Rogers Diffusion of Innovation 1962
  • Conclusions
  • On best practices in the development of bioinformatics software
  • Questions
Page 28: The life-sciences as a pathfinder in data-intensive research practice

Web of Objects Archiving

11 April 2023 CC-BY-SA hvdsomp and atreloar 28

Not just citation relationships

11 April 2023 CC-BY-SA hvdsomp and atreloar 29

Your Text Here

The problem of obsolescence Lifescience research environment can be viewed as

undergoing a process of accelerated evolution Other disciplines will hit these problems in time

11 April 2023 CC-BY-SA atreloar 30

Cambrian explosion

11 April 2023 31

Hardware obsolescence Roche 454

11 April 2023 CC-BY-SA atreloar 32

Software obsolescence too much choice not enough support

11 April 2023 CC-BY-SA atreloar 33

Abandonware ldquoLast summer a member of the biology department of the

University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004

11 April 2023 CC-BY-SA atreloar 34

File format obsolescence Illumina Probability of error in basecalling encoded using ascii code

to reduce file size Meaning of the ascii code changed along the life cycle and

for data generated at different time points the quality might be encoded differently

ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip

11 April 2023 CC-BY-SA atreloar 35

Everett Rogers Diffusion of Innovation 1962

11 April 2023 CC-BY-SA atreloar 36

Conclusions Need to move to a smaller number of standard file

formats Need to move to a more sustainable model of

software development and maintenance Need to encourage platform manufacturers to

innovate around the hardware not the software NOTE other disciplines are looking to lifesciences

to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37

On best practices in the development of bioinformatics software Front Genet 02 Jul 14

Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories

available

11 April 2023 CC-BY-SA atreloar 38

Questions andrewtreloarandsorgau

atreloar

httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice

11 April 2023 CC-BY-SA atreloar 39

  • The life-sciences as a pathfinder in data-intensive research pr
  • Structure presentation
  • So many lifecycleshellip
  • Minimal Research Lifecycle
  • Sharing Scholarly Communication System and its Functions
  • System of Journals
  • Pointers to the future
  • Registration BioRxiv
  • Registration Github
  • Registration WikiPathways
  • Registration NeuroLex
  • Registration Nanopublications
  • Registration some observations
  • Certification PubMed Commons
  • Certification PubPeer
  • Certification Publons
  • Certification some observations
  • Awareness myExperiment
  • Awareness eLabNotebook RSS
  • Awareness Twitter
  • Awareness some observations
  • Archiving PDB
  • Archiving GenBank
  • Characterising the future
  • Fundamental changes
  • Pathfinder problems
  • System of Journals Archiving
  • Web of Objects Archiving
  • Not just citation relationships
  • The problem of obsolescence
  • Cambrian explosion
  • Hardware obsolescence Roche 454
  • Software obsolescence too much choice not enough support
  • Abandonware
  • File format obsolescence Illumina
  • Everett Rogers Diffusion of Innovation 1962
  • Conclusions
  • On best practices in the development of bioinformatics software
  • Questions
Page 29: The life-sciences as a pathfinder in data-intensive research practice

Not just citation relationships

11 April 2023 CC-BY-SA hvdsomp and atreloar 29

Your Text Here

The problem of obsolescence Lifescience research environment can be viewed as

undergoing a process of accelerated evolution Other disciplines will hit these problems in time

11 April 2023 CC-BY-SA atreloar 30

Cambrian explosion

11 April 2023 31

Hardware obsolescence Roche 454

11 April 2023 CC-BY-SA atreloar 32

Software obsolescence too much choice not enough support

11 April 2023 CC-BY-SA atreloar 33

Abandonware ldquoLast summer a member of the biology department of the

University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004

11 April 2023 CC-BY-SA atreloar 34

File format obsolescence Illumina Probability of error in basecalling encoded using ascii code

to reduce file size Meaning of the ascii code changed along the life cycle and

for data generated at different time points the quality might be encoded differently

ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip

11 April 2023 CC-BY-SA atreloar 35

Everett Rogers Diffusion of Innovation 1962

11 April 2023 CC-BY-SA atreloar 36

Conclusions Need to move to a smaller number of standard file

formats Need to move to a more sustainable model of

software development and maintenance Need to encourage platform manufacturers to

innovate around the hardware not the software NOTE other disciplines are looking to lifesciences

to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37

On best practices in the development of bioinformatics software Front Genet 02 Jul 14

Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories

available

11 April 2023 CC-BY-SA atreloar 38

Questions andrewtreloarandsorgau

atreloar

httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice

11 April 2023 CC-BY-SA atreloar 39

  • The life-sciences as a pathfinder in data-intensive research pr
  • Structure presentation
  • So many lifecycleshellip
  • Minimal Research Lifecycle
  • Sharing Scholarly Communication System and its Functions
  • System of Journals
  • Pointers to the future
  • Registration BioRxiv
  • Registration Github
  • Registration WikiPathways
  • Registration NeuroLex
  • Registration Nanopublications
  • Registration some observations
  • Certification PubMed Commons
  • Certification PubPeer
  • Certification Publons
  • Certification some observations
  • Awareness myExperiment
  • Awareness eLabNotebook RSS
  • Awareness Twitter
  • Awareness some observations
  • Archiving PDB
  • Archiving GenBank
  • Characterising the future
  • Fundamental changes
  • Pathfinder problems
  • System of Journals Archiving
  • Web of Objects Archiving
  • Not just citation relationships
  • The problem of obsolescence
  • Cambrian explosion
  • Hardware obsolescence Roche 454
  • Software obsolescence too much choice not enough support
  • Abandonware
  • File format obsolescence Illumina
  • Everett Rogers Diffusion of Innovation 1962
  • Conclusions
  • On best practices in the development of bioinformatics software
  • Questions
Page 30: The life-sciences as a pathfinder in data-intensive research practice

The problem of obsolescence Lifescience research environment can be viewed as

undergoing a process of accelerated evolution Other disciplines will hit these problems in time

11 April 2023 CC-BY-SA atreloar 30

Cambrian explosion

11 April 2023 31

Hardware obsolescence Roche 454

11 April 2023 CC-BY-SA atreloar 32

Software obsolescence too much choice not enough support

11 April 2023 CC-BY-SA atreloar 33

Abandonware ldquoLast summer a member of the biology department of the

University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004

11 April 2023 CC-BY-SA atreloar 34

File format obsolescence Illumina Probability of error in basecalling encoded using ascii code

to reduce file size Meaning of the ascii code changed along the life cycle and

for data generated at different time points the quality might be encoded differently

ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip

11 April 2023 CC-BY-SA atreloar 35

Everett Rogers Diffusion of Innovation 1962

11 April 2023 CC-BY-SA atreloar 36

Conclusions Need to move to a smaller number of standard file

formats Need to move to a more sustainable model of

software development and maintenance Need to encourage platform manufacturers to

innovate around the hardware not the software NOTE other disciplines are looking to lifesciences

to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37

On best practices in the development of bioinformatics software Front Genet 02 Jul 14

Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories

available

11 April 2023 CC-BY-SA atreloar 38

Questions andrewtreloarandsorgau

atreloar

httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice

11 April 2023 CC-BY-SA atreloar 39

  • The life-sciences as a pathfinder in data-intensive research pr
  • Structure presentation
  • So many lifecycleshellip
  • Minimal Research Lifecycle
  • Sharing Scholarly Communication System and its Functions
  • System of Journals
  • Pointers to the future
  • Registration BioRxiv
  • Registration Github
  • Registration WikiPathways
  • Registration NeuroLex
  • Registration Nanopublications
  • Registration some observations
  • Certification PubMed Commons
  • Certification PubPeer
  • Certification Publons
  • Certification some observations
  • Awareness myExperiment
  • Awareness eLabNotebook RSS
  • Awareness Twitter
  • Awareness some observations
  • Archiving PDB
  • Archiving GenBank
  • Characterising the future
  • Fundamental changes
  • Pathfinder problems
  • System of Journals Archiving
  • Web of Objects Archiving
  • Not just citation relationships
  • The problem of obsolescence
  • Cambrian explosion
  • Hardware obsolescence Roche 454
  • Software obsolescence too much choice not enough support
  • Abandonware
  • File format obsolescence Illumina
  • Everett Rogers Diffusion of Innovation 1962
  • Conclusions
  • On best practices in the development of bioinformatics software
  • Questions
Page 31: The life-sciences as a pathfinder in data-intensive research practice

Cambrian explosion

11 April 2023 31

Hardware obsolescence Roche 454

11 April 2023 CC-BY-SA atreloar 32

Software obsolescence too much choice not enough support

11 April 2023 CC-BY-SA atreloar 33

Abandonware ldquoLast summer a member of the biology department of the

University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004

11 April 2023 CC-BY-SA atreloar 34

File format obsolescence Illumina Probability of error in basecalling encoded using ascii code

to reduce file size Meaning of the ascii code changed along the life cycle and

for data generated at different time points the quality might be encoded differently

ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip

11 April 2023 CC-BY-SA atreloar 35

Everett Rogers Diffusion of Innovation 1962

11 April 2023 CC-BY-SA atreloar 36

Conclusions Need to move to a smaller number of standard file

formats Need to move to a more sustainable model of

software development and maintenance Need to encourage platform manufacturers to

innovate around the hardware not the software NOTE other disciplines are looking to lifesciences

to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37

On best practices in the development of bioinformatics software Front Genet 02 Jul 14

Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories

available

11 April 2023 CC-BY-SA atreloar 38

Questions andrewtreloarandsorgau

atreloar

httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice

11 April 2023 CC-BY-SA atreloar 39

  • The life-sciences as a pathfinder in data-intensive research pr
  • Structure presentation
  • So many lifecycleshellip
  • Minimal Research Lifecycle
  • Sharing Scholarly Communication System and its Functions
  • System of Journals
  • Pointers to the future
  • Registration BioRxiv
  • Registration Github
  • Registration WikiPathways
  • Registration NeuroLex
  • Registration Nanopublications
  • Registration some observations
  • Certification PubMed Commons
  • Certification PubPeer
  • Certification Publons
  • Certification some observations
  • Awareness myExperiment
  • Awareness eLabNotebook RSS
  • Awareness Twitter
  • Awareness some observations
  • Archiving PDB
  • Archiving GenBank
  • Characterising the future
  • Fundamental changes
  • Pathfinder problems
  • System of Journals Archiving
  • Web of Objects Archiving
  • Not just citation relationships
  • The problem of obsolescence
  • Cambrian explosion
  • Hardware obsolescence Roche 454
  • Software obsolescence too much choice not enough support
  • Abandonware
  • File format obsolescence Illumina
  • Everett Rogers Diffusion of Innovation 1962
  • Conclusions
  • On best practices in the development of bioinformatics software
  • Questions
Page 32: The life-sciences as a pathfinder in data-intensive research practice

Hardware obsolescence Roche 454

11 April 2023 CC-BY-SA atreloar 32

Software obsolescence too much choice not enough support

11 April 2023 CC-BY-SA atreloar 33

Abandonware ldquoLast summer a member of the biology department of the

University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004

11 April 2023 CC-BY-SA atreloar 34

File format obsolescence Illumina Probability of error in basecalling encoded using ascii code

to reduce file size Meaning of the ascii code changed along the life cycle and

for data generated at different time points the quality might be encoded differently

ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip

11 April 2023 CC-BY-SA atreloar 35

Everett Rogers Diffusion of Innovation 1962

11 April 2023 CC-BY-SA atreloar 36

Conclusions Need to move to a smaller number of standard file

formats Need to move to a more sustainable model of

software development and maintenance Need to encourage platform manufacturers to

innovate around the hardware not the software NOTE other disciplines are looking to lifesciences

to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37

On best practices in the development of bioinformatics software Front Genet 02 Jul 14

Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories

available

11 April 2023 CC-BY-SA atreloar 38

Questions andrewtreloarandsorgau

atreloar

httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice

11 April 2023 CC-BY-SA atreloar 39

  • The life-sciences as a pathfinder in data-intensive research pr
  • Structure presentation
  • So many lifecycleshellip
  • Minimal Research Lifecycle
  • Sharing Scholarly Communication System and its Functions
  • System of Journals
  • Pointers to the future
  • Registration BioRxiv
  • Registration Github
  • Registration WikiPathways
  • Registration NeuroLex
  • Registration Nanopublications
  • Registration some observations
  • Certification PubMed Commons
  • Certification PubPeer
  • Certification Publons
  • Certification some observations
  • Awareness myExperiment
  • Awareness eLabNotebook RSS
  • Awareness Twitter
  • Awareness some observations
  • Archiving PDB
  • Archiving GenBank
  • Characterising the future
  • Fundamental changes
  • Pathfinder problems
  • System of Journals Archiving
  • Web of Objects Archiving
  • Not just citation relationships
  • The problem of obsolescence
  • Cambrian explosion
  • Hardware obsolescence Roche 454
  • Software obsolescence too much choice not enough support
  • Abandonware
  • File format obsolescence Illumina
  • Everett Rogers Diffusion of Innovation 1962
  • Conclusions
  • On best practices in the development of bioinformatics software
  • Questions
Page 33: The life-sciences as a pathfinder in data-intensive research practice

Software obsolescence too much choice not enough support

11 April 2023 CC-BY-SA atreloar 33

Abandonware ldquoLast summer a member of the biology department of the

University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004

11 April 2023 CC-BY-SA atreloar 34

File format obsolescence Illumina Probability of error in basecalling encoded using ascii code

to reduce file size Meaning of the ascii code changed along the life cycle and

for data generated at different time points the quality might be encoded differently

ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip

11 April 2023 CC-BY-SA atreloar 35

Everett Rogers Diffusion of Innovation 1962

11 April 2023 CC-BY-SA atreloar 36

Conclusions Need to move to a smaller number of standard file

formats Need to move to a more sustainable model of

software development and maintenance Need to encourage platform manufacturers to

innovate around the hardware not the software NOTE other disciplines are looking to lifesciences

to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37

On best practices in the development of bioinformatics software Front Genet 02 Jul 14

Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories

available

11 April 2023 CC-BY-SA atreloar 38

Questions andrewtreloarandsorgau

atreloar

httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice

11 April 2023 CC-BY-SA atreloar 39

  • The life-sciences as a pathfinder in data-intensive research pr
  • Structure presentation
  • So many lifecycleshellip
  • Minimal Research Lifecycle
  • Sharing Scholarly Communication System and its Functions
  • System of Journals
  • Pointers to the future
  • Registration BioRxiv
  • Registration Github
  • Registration WikiPathways
  • Registration NeuroLex
  • Registration Nanopublications
  • Registration some observations
  • Certification PubMed Commons
  • Certification PubPeer
  • Certification Publons
  • Certification some observations
  • Awareness myExperiment
  • Awareness eLabNotebook RSS
  • Awareness Twitter
  • Awareness some observations
  • Archiving PDB
  • Archiving GenBank
  • Characterising the future
  • Fundamental changes
  • Pathfinder problems
  • System of Journals Archiving
  • Web of Objects Archiving
  • Not just citation relationships
  • The problem of obsolescence
  • Cambrian explosion
  • Hardware obsolescence Roche 454
  • Software obsolescence too much choice not enough support
  • Abandonware
  • File format obsolescence Illumina
  • Everett Rogers Diffusion of Innovation 1962
  • Conclusions
  • On best practices in the development of bioinformatics software
  • Questions
Page 34: The life-sciences as a pathfinder in data-intensive research practice

Abandonware ldquoLast summer a member of the biology department of the

University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004

11 April 2023 CC-BY-SA atreloar 34

File format obsolescence Illumina Probability of error in basecalling encoded using ascii code

to reduce file size Meaning of the ascii code changed along the life cycle and

for data generated at different time points the quality might be encoded differently

ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip

11 April 2023 CC-BY-SA atreloar 35

Everett Rogers Diffusion of Innovation 1962

11 April 2023 CC-BY-SA atreloar 36

Conclusions Need to move to a smaller number of standard file

formats Need to move to a more sustainable model of

software development and maintenance Need to encourage platform manufacturers to

innovate around the hardware not the software NOTE other disciplines are looking to lifesciences

to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37

On best practices in the development of bioinformatics software Front Genet 02 Jul 14

Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories

available

11 April 2023 CC-BY-SA atreloar 38

Questions andrewtreloarandsorgau

atreloar

httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice

11 April 2023 CC-BY-SA atreloar 39

  • The life-sciences as a pathfinder in data-intensive research pr
  • Structure presentation
  • So many lifecycleshellip
  • Minimal Research Lifecycle
  • Sharing Scholarly Communication System and its Functions
  • System of Journals
  • Pointers to the future
  • Registration BioRxiv
  • Registration Github
  • Registration WikiPathways
  • Registration NeuroLex
  • Registration Nanopublications
  • Registration some observations
  • Certification PubMed Commons
  • Certification PubPeer
  • Certification Publons
  • Certification some observations
  • Awareness myExperiment
  • Awareness eLabNotebook RSS
  • Awareness Twitter
  • Awareness some observations
  • Archiving PDB
  • Archiving GenBank
  • Characterising the future
  • Fundamental changes
  • Pathfinder problems
  • System of Journals Archiving
  • Web of Objects Archiving
  • Not just citation relationships
  • The problem of obsolescence
  • Cambrian explosion
  • Hardware obsolescence Roche 454
  • Software obsolescence too much choice not enough support
  • Abandonware
  • File format obsolescence Illumina
  • Everett Rogers Diffusion of Innovation 1962
  • Conclusions
  • On best practices in the development of bioinformatics software
  • Questions
Page 35: The life-sciences as a pathfinder in data-intensive research practice

File format obsolescence Illumina Probability of error in basecalling encoded using ascii code

to reduce file size Meaning of the ascii code changed along the life cycle and

for data generated at different time points the quality might be encoded differently

ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip

11 April 2023 CC-BY-SA atreloar 35

Everett Rogers Diffusion of Innovation 1962

11 April 2023 CC-BY-SA atreloar 36

Conclusions Need to move to a smaller number of standard file

formats Need to move to a more sustainable model of

software development and maintenance Need to encourage platform manufacturers to

innovate around the hardware not the software NOTE other disciplines are looking to lifesciences

to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37

On best practices in the development of bioinformatics software Front Genet 02 Jul 14

Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories

available

11 April 2023 CC-BY-SA atreloar 38

Questions andrewtreloarandsorgau

atreloar

httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice

11 April 2023 CC-BY-SA atreloar 39

  • The life-sciences as a pathfinder in data-intensive research pr
  • Structure presentation
  • So many lifecycleshellip
  • Minimal Research Lifecycle
  • Sharing Scholarly Communication System and its Functions
  • System of Journals
  • Pointers to the future
  • Registration BioRxiv
  • Registration Github
  • Registration WikiPathways
  • Registration NeuroLex
  • Registration Nanopublications
  • Registration some observations
  • Certification PubMed Commons
  • Certification PubPeer
  • Certification Publons
  • Certification some observations
  • Awareness myExperiment
  • Awareness eLabNotebook RSS
  • Awareness Twitter
  • Awareness some observations
  • Archiving PDB
  • Archiving GenBank
  • Characterising the future
  • Fundamental changes
  • Pathfinder problems
  • System of Journals Archiving
  • Web of Objects Archiving
  • Not just citation relationships
  • The problem of obsolescence
  • Cambrian explosion
  • Hardware obsolescence Roche 454
  • Software obsolescence too much choice not enough support
  • Abandonware
  • File format obsolescence Illumina
  • Everett Rogers Diffusion of Innovation 1962
  • Conclusions
  • On best practices in the development of bioinformatics software
  • Questions
Page 36: The life-sciences as a pathfinder in data-intensive research practice

Everett Rogers Diffusion of Innovation 1962

11 April 2023 CC-BY-SA atreloar 36

Conclusions Need to move to a smaller number of standard file

formats Need to move to a more sustainable model of

software development and maintenance Need to encourage platform manufacturers to

innovate around the hardware not the software NOTE other disciplines are looking to lifesciences

to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37

On best practices in the development of bioinformatics software Front Genet 02 Jul 14

Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories

available

11 April 2023 CC-BY-SA atreloar 38

Questions andrewtreloarandsorgau

atreloar

httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice

11 April 2023 CC-BY-SA atreloar 39

  • The life-sciences as a pathfinder in data-intensive research pr
  • Structure presentation
  • So many lifecycleshellip
  • Minimal Research Lifecycle
  • Sharing Scholarly Communication System and its Functions
  • System of Journals
  • Pointers to the future
  • Registration BioRxiv
  • Registration Github
  • Registration WikiPathways
  • Registration NeuroLex
  • Registration Nanopublications
  • Registration some observations
  • Certification PubMed Commons
  • Certification PubPeer
  • Certification Publons
  • Certification some observations
  • Awareness myExperiment
  • Awareness eLabNotebook RSS
  • Awareness Twitter
  • Awareness some observations
  • Archiving PDB
  • Archiving GenBank
  • Characterising the future
  • Fundamental changes
  • Pathfinder problems
  • System of Journals Archiving
  • Web of Objects Archiving
  • Not just citation relationships
  • The problem of obsolescence
  • Cambrian explosion
  • Hardware obsolescence Roche 454
  • Software obsolescence too much choice not enough support
  • Abandonware
  • File format obsolescence Illumina
  • Everett Rogers Diffusion of Innovation 1962
  • Conclusions
  • On best practices in the development of bioinformatics software
  • Questions
Page 37: The life-sciences as a pathfinder in data-intensive research practice

Conclusions Need to move to a smaller number of standard file

formats Need to move to a more sustainable model of

software development and maintenance Need to encourage platform manufacturers to

innovate around the hardware not the software NOTE other disciplines are looking to lifesciences

to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37

On best practices in the development of bioinformatics software Front Genet 02 Jul 14

Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories

available

11 April 2023 CC-BY-SA atreloar 38

Questions andrewtreloarandsorgau

atreloar

httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice

11 April 2023 CC-BY-SA atreloar 39

  • The life-sciences as a pathfinder in data-intensive research pr
  • Structure presentation
  • So many lifecycleshellip
  • Minimal Research Lifecycle
  • Sharing Scholarly Communication System and its Functions
  • System of Journals
  • Pointers to the future
  • Registration BioRxiv
  • Registration Github
  • Registration WikiPathways
  • Registration NeuroLex
  • Registration Nanopublications
  • Registration some observations
  • Certification PubMed Commons
  • Certification PubPeer
  • Certification Publons
  • Certification some observations
  • Awareness myExperiment
  • Awareness eLabNotebook RSS
  • Awareness Twitter
  • Awareness some observations
  • Archiving PDB
  • Archiving GenBank
  • Characterising the future
  • Fundamental changes
  • Pathfinder problems
  • System of Journals Archiving
  • Web of Objects Archiving
  • Not just citation relationships
  • The problem of obsolescence
  • Cambrian explosion
  • Hardware obsolescence Roche 454
  • Software obsolescence too much choice not enough support
  • Abandonware
  • File format obsolescence Illumina
  • Everett Rogers Diffusion of Innovation 1962
  • Conclusions
  • On best practices in the development of bioinformatics software
  • Questions
Page 38: The life-sciences as a pathfinder in data-intensive research practice

On best practices in the development of bioinformatics software Front Genet 02 Jul 14

Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories

available

11 April 2023 CC-BY-SA atreloar 38

Questions andrewtreloarandsorgau

atreloar

httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice

11 April 2023 CC-BY-SA atreloar 39

  • The life-sciences as a pathfinder in data-intensive research pr
  • Structure presentation
  • So many lifecycleshellip
  • Minimal Research Lifecycle
  • Sharing Scholarly Communication System and its Functions
  • System of Journals
  • Pointers to the future
  • Registration BioRxiv
  • Registration Github
  • Registration WikiPathways
  • Registration NeuroLex
  • Registration Nanopublications
  • Registration some observations
  • Certification PubMed Commons
  • Certification PubPeer
  • Certification Publons
  • Certification some observations
  • Awareness myExperiment
  • Awareness eLabNotebook RSS
  • Awareness Twitter
  • Awareness some observations
  • Archiving PDB
  • Archiving GenBank
  • Characterising the future
  • Fundamental changes
  • Pathfinder problems
  • System of Journals Archiving
  • Web of Objects Archiving
  • Not just citation relationships
  • The problem of obsolescence
  • Cambrian explosion
  • Hardware obsolescence Roche 454
  • Software obsolescence too much choice not enough support
  • Abandonware
  • File format obsolescence Illumina
  • Everett Rogers Diffusion of Innovation 1962
  • Conclusions
  • On best practices in the development of bioinformatics software
  • Questions
Page 39: The life-sciences as a pathfinder in data-intensive research practice

Questions andrewtreloarandsorgau

atreloar

httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice

11 April 2023 CC-BY-SA atreloar 39

  • The life-sciences as a pathfinder in data-intensive research pr
  • Structure presentation
  • So many lifecycleshellip
  • Minimal Research Lifecycle
  • Sharing Scholarly Communication System and its Functions
  • System of Journals
  • Pointers to the future
  • Registration BioRxiv
  • Registration Github
  • Registration WikiPathways
  • Registration NeuroLex
  • Registration Nanopublications
  • Registration some observations
  • Certification PubMed Commons
  • Certification PubPeer
  • Certification Publons
  • Certification some observations
  • Awareness myExperiment
  • Awareness eLabNotebook RSS
  • Awareness Twitter
  • Awareness some observations
  • Archiving PDB
  • Archiving GenBank
  • Characterising the future
  • Fundamental changes
  • Pathfinder problems
  • System of Journals Archiving
  • Web of Objects Archiving
  • Not just citation relationships
  • The problem of obsolescence
  • Cambrian explosion
  • Hardware obsolescence Roche 454
  • Software obsolescence too much choice not enough support
  • Abandonware
  • File format obsolescence Illumina
  • Everett Rogers Diffusion of Innovation 1962
  • Conclusions
  • On best practices in the development of bioinformatics software
  • Questions