improving research software citation, attribution and ... · improving research software citation,...

1
Improving Research Software Citation, Attribution and Impact Metrics with the SBGrid Consortium SBGrid Consortium Comprehensive solution for structural biology software and advanced computing resources. SBGrid Software Collection • Actively curated collection of over 270 scientific software titles commonly used in structural biology research • More than 240 member laboratories at 75 research institutions in 17 countries worldwide.. • Community led Discover and identification of new and useful software and applications. • Automatic compiling, configuration, deployment and maintenance of the entire collection on member computers. • Support for variety of popular hardware and operating systems. • All programs operate from a standard user interface. • Continual updates, upgrades and bug-fixes of all programs in the collection. • Entire process is completely automated and transparent to end-users. • Consolidated licensing process – one EULA to sign for the entire software collection. SBGrid Science Portal • Web-based portal provides easy access to grid and large-scale computing resources of the national cyber-infrastructure. • Collaboration with the Open Science Grid (OSG). • Easy to use and intuitive web-interface allows seamless job submission to supercomputing centers nationwide. • Portal takes care of packaging, distributing and launching the computations and returning results. • The SBGrid portal offers Wide Search Molecular Replacement (WSMR) and Deformable Elastic Network (DEN) refinement . • SBGrid also operates its own computing clusters for specialized X-ray crystallography applications. http://sbgrid.org Developers Network • Extensive network of software development tools to help in writing, compiling, tracking and testing code. • Free use of many high performance compilers, version control systems and shell text editors. • Array of physical and virtualized system/hardware architectures for compiling and testing code. • Software beta testing by active structural biologists. Outreach and Education • Frequent webinars and seminars by contributing SBGrid software developers. • Scientific talks by prominent structural biologists. • Regular computing schools and workshops led by the program creators. • Facilitating two-way scientific communication between the creators and users of computational tools and techniques. Policy and Advocacy • Engages in policy and advocacy activities on behalf of its members and the broader research computing community. • Policy research, standard setting and community organizing. • Published in leading journals regarding major issues of research computing policy. Represent structural biology research computing interests in standard setting bodies. • Organize and participate in research computing conferences. P l a n n i ng D e p l o y m e n t S u p p o r t M a i n t e n a n c e T r a n s i t i o n i n g The SBGrid Consortium is a computing collaboration of over 285 structural biology labs at more than 95 academic research institutions and 3 pharmaceutical companies in 20 countries that provides advanced support for structural biology research computing. Consortium members benefit from automatic dissemination of a comprehensive collection of over 295 scientific software applications used in structural biology research, web-based computing portal access to the national cyber-infrastructure for grid and large-scale computations, tools and resources for scientific software development, outreach and education and policy and advocacy efforts on behalf of the research computing community. By sharing the costs of research computing support among its members, the SBGrid Consortium expands access, facilitates communication, reduces costs and lowers barriers to biomedical computational research. As facilitator and middleman between scientific software developers and end-users, SBGrid offers unparalleled opportunity for developing and prototyping methods for improving measures of scientist-created software citation, attribution and impact. SBGrid operates as a member supported, NIH-compliant non-profit based at Harvard Medical School. Select SBGrid Publications • Andrew Morin, Ben Eisenbraun, Jason Key, Paul C Sanschagrin, Michael A Timony, Michelle Ottaviano and Piotr Sliz Collaboration gets the most out of software. eLife e01456: (2013). • Andrew Morin and Piotr Sliz. Optimizing Peer Review of Software Code. Science 341: 236-237 (2013). • Andrew Morin, Piotr Sliz. Structural Biology Computing: Lessons for the Biomedical Research Sciences. Biopolymers: In press (2013). • Amber L McConahy, Ben Eisenbraun, James Howison, James Herbsleb & Piotr Sliz. Techniques for Monitoring Runtime Architectures of Socio-technical Ecosystems. ACM Conference on Data-Intensive Collaboration in Science and Engineering. (2012). • Daniel J O'Donovan, Ian Stokes-Rees, Yunsun Nam, Stephen Blacklow, Gunnar F Schroder, Axel T Brunger and Piotr Sliz. A grid-enabled web service for low-resolution crystal structure refinement. Acta Crystallographica D68: 261-267 (2012). • Ian Stokes-Rees, Ian Levesque, Frank V. Murphy IV, Wei Yang, Ashley Deacon and Piotr Sliz. Adapting federated cyberinfrastructure for shared data collection facilities in structural biology. Journal of Synchrotron Radiation 19: TBD (2012). • Andrew Morin, Jennifer Urban, Piotr Sliz. A quick guide to software licensing for the scientist-programmer. PLoS Computational Biology 8 (7), e1002598 (2012). • Andrew Morin, Jennifer Urban, Paul D Adams, Ian Foster, Andrej Sali, David Baker and Piotr Sliz. Shining Light into Black Boxes. Science 6078: 159-160 (2012). • Ian Stokes-Rees and Piotr Sliz. Protein Structure Determination by exhaustive search of Protein Data Bank derived databases. PNAS (2010). Andrew Morin, Stephanie Socias, Carol Herre, Jason Key, Mick Timony, Piotr Sliz SBGrid.org & Dept. of Biological Chemistry & Molecular Pharmacology, Harvard Medical School Software Citation, Attribution and Impact Assessing established and alternative metrics for scientist-created research software. User Self-Reporting with AppCiter • Create user self-reported software usage “AppCiter” webapp and associated reference database. • Using SBGrid database of up-to-date citation and reference information for each of the 400+ supported structural biology applications. • Encourage accurate and complete citation by program users. • Categorically and hierarchically organized list of all hosted programs. • Users choose among multiple export formats including plaintext, XML and BibTeX export file formats for importing into bibliographic management programs. • Provide user self-reported data regarding program usage frequency. • Useful feedback to program developers and cross-reference against mined citation data. • AppCiter also supports multiple version histories of each program to accommodate changing author attributions over time and version. • Program developers will be able to login and update their bibliographic information at will to help ensure accurate, complete and up-to-date citation information. Collection of Direct Software Usage Statistics • Analyze software usage statistical data recorded from tracking wrapper scripts for each 400+ programs in the SBGrid software library. • Wrapper scripts capture and record information regarding program launch, execution and environmental runtime parameters. • Data collection will be opt-in, encrypted, anonymized and transmit no scientific or personally identifying data to SBGrid servers. • Direct statistics collection system will allows measurement of program usage in fine detail. • Collected data will be useful to SBGrid for administration and research and to scientist software developers for funding, planning, development and to accurately assess the impact of their work. • Data will allow independent measure of program use (hours per year, per user, per file, etc.), estimations of utility and provide direct technical feedback to program developers unavailable by other means. • Analyze and disseminate direct tracking and self-reported research software usage data. • Design and deploy automated reporting tool of directly collected and self-reported usage data for program developers and users. Supported by a grant from the National Science Foundation Award # 1448069 Data Mining Historical Resources • Quantitative analysis of historical use vs. citation data. • Mine PMC-OA and PDB databases for entries pertaining to 10 commonly used structural biology programs. • Extract XML and plaintext of all structure files contained in the PDB and text and supplemental materials of all PMC-OA articles using 77 unique data descriptors. • Algorithms able to discriminate PMC-OA extracts as originating from main body, reference, or supplemental materials of each article. • Quantify program use vs. citation and the fidelity of derived citation metrics. • Use occurrences of each programs in PDB will be cross-referenced with text from the PMC-OA database. • Cross-referencing permits tacking of individual program use vs. rates, fidelity and placement (body, references, supplement) of subsequent citations. • Correlation of use and citation will be further compared to data from leading citation indexers (e.g. Web-of-Knowledge, Google Scholar, Scopus, etc.) to assess the frequency of supplemental citations missing from citation indexer databases, and thus from derivative metrics (h-index, g-index, etc.). • Calculate an average “correction factor” for missing, incorrect and non-indexed citations. • Evaluation of alternative metrics for impact of scientist-created research software.

Upload: others

Post on 10-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Improving Research Software Citation, Attribution and ... · Improving Research Software Citation, Attribution and Impact Metrics with the SBGrid Consortium ... • Scientific talks

Improving Research Software Citation, Attribution and Impact Metricswith the SBGrid Consortium

SBGrid ConsortiumComprehensive solution for structural biology software and advanced computing resources.

SBGrid Software Collection• Actively curated collection of over 270 scientific software titles commonly used in structural biology research• More than 240 member laboratories at 75 research institutions in 17 countries worldwide..• Community led Discover and identification of new and useful software and applications.• Automatic compiling, configuration, deployment and maintenance of the entire collection on member computers.• Support for variety of popular hardware and operating systems.• All programs operate from a standard user interface. • Continual updates, upgrades and bug-fixes of all programs in the collection.• Entire process is completely automated and transparent to end-users.• Consolidated licensing process – one EULA to sign for the entire software collection.

SBGrid Science Portal• Web-based portal provides easy access to grid and large-scale computing resources of the national cyber-infrastructure.• Collaboration with the Open Science Grid (OSG). • Easy to use and intuitive web-interface allows seamless job submission to supercomputing centers nationwide.• Portal takes care of packaging, distributing and launching the computations and returning results.• The SBGrid portal offers Wide Search Molecular Replacement (WSMR) and Deformable Elastic Network (DEN) refinement .• SBGrid also operates its own computing clusters for specialized X-ray crystallography applications.

http://sbgrid.org

Developers Network• Extensive network of software development tools to help in writing, compiling, tracking and testing code. • Free use of many high performance compilers, version control systems and shell text editors.• Array of physical and virtualized system/hardware architectures for compiling and testing code.• Software beta testing by active structural biologists.

Outreach and Education• Frequent webinars and seminars by contributing SBGrid software developers.

• Scientific talks by prominent structural biologists.• Regular computing schools and workshops led by the program creators.

• Facilitating two-way scientific communication between the creators and users of computational tools and techniques.

Policy and Advocacy• Engages in policy and advocacy activities on behalf of its members and the broader research computing community.

• Policy research, standard setting and community organizing.• Published in leading journals regarding major issues of research computing policy.

• Represent structural biology research computing interests in standard setting bodies.• Organize and participate in research computing conferences.

Planning

Deploym

ent Support

Mai

nten

ance

T

ra

nsitioning

The SBGrid Consortium is a computing collaboration of over 285 structural biology labs at more than 95 academic research institutions and 3 pharmaceutical companies in 20 countries that provides advanced support for structural biology research computing. Consortium members bene�t from automatic dissemination of a comprehensive collection of over 295 scienti�c software applications used in structural biology research, web-based computing portal access to the national cyber-infrastructure for grid and large-scale computations, tools and resources for scienti�c software development, outreach and education and policy and advocacy e�orts on behalf of the research computing community. By sharing the costs of research computing support among its members, the SBGrid Consortium expands access, facilitates communication, reduces costs and lowers barriers to biomedical computational research. As facilitator and middleman between scienti�c software developers and end-users, SBGrid o�ers unparalleled opportunity for developing and prototyping methods for improving measures of scientist-created software citation, attribution and impact. SBGrid operates as a member supported, NIH-compliant non-pro�t based at Harvard Medical School.

Select SBGrid Publications• Andrew Morin, Ben Eisenbraun, Jason Key, Paul C Sanschagrin, Michael A Timony, Michelle Ottaviano and Piotr Sliz Collaboration gets the most out of software. eLife e01456: (2013).• Andrew Morin and Piotr Sliz. Optimizing Peer Review of Software Code. Science 341: 236-237 (2013).• Andrew Morin, Piotr Sliz. Structural Biology Computing: Lessons for the Biomedical Research Sciences. Biopolymers: In press (2013).• Amber L McConahy, Ben Eisenbraun, James Howison, James Herbsleb & Piotr Sliz. Techniques for Monitoring Runtime Architectures of Socio-technical Ecosystems. ACM Conference on Data-Intensive Collaboration in Science and Engineering. (2012).• Daniel J O'Donovan, Ian Stokes-Rees, Yunsun Nam, Stephen Blacklow, Gunnar F Schroder, Axel T Brunger and Piotr Sliz. A grid-enabled web service for low-resolution crystal structure refinement. Acta Crystallographica D68: 261-267 (2012).• Ian Stokes-Rees, Ian Levesque, Frank V. Murphy IV, Wei Yang, Ashley Deacon and Piotr Sliz. Adapting federated cyberinfrastructure for shared data collection facilities in structural biology. Journal of Synchrotron Radiation 19: TBD (2012).• Andrew Morin, Jennifer Urban, Piotr Sliz. A quick guide to software licensing for the scientist-programmer. PLoS Computational Biology 8 (7), e1002598 (2012).• Andrew Morin, Jennifer Urban, Paul D Adams, Ian Foster, Andrej Sali , David Baker and Piotr Sliz. Shining Light into Black Boxes. Science 6078: 159-160 (2012).• Ian Stokes-Rees and Piotr Sliz. Protein Structure Determination by exhaustive search of Protein Data Bank derived databases. PNAS (2010).

Andrew Morin, Stephanie Socias, Carol Herre, Jason Key, Mick Timony, Piotr Sliz SBGrid.org & Dept. of Biological Chemistr y & Molecular Pharmacology, Har vard Medical School

Software Citation, Attribution and ImpactAssessing established and alternative metrics for scientist-created research software.

User Self-Reporting with AppCiter• Create user self-reported software usage “AppCiter” webapp and associated reference database.• Using SBGrid database of up-to-date citation and reference information for each of the 400+ supported structural biology applications.• Encourage accurate and complete citation by program users.• Categorically and hierarchically organized list of all hosted programs.• Users choose among multiple export formats including plaintext, XML and BibTeX export file formats for importing into bibliographic management programs.• Provide user self-reported data regarding program usage frequency.• Useful feedback to program developers and cross-reference against mined citation data.• AppCiter also supports multiple version histories of each program to accommodate changing author attributions over time and version.• Program developers will be able to login and update their bibliographic information at will to help ensure accurate, complete and up-to-date citation information.

Collection of Direct Software Usage Statistics• Analyze software usage statistical data recorded from tracking wrapper scripts for each 400+ programs in the SBGrid software library.• Wrapper scripts capture and record information regarding program launch, execution and environmental runtime parameters.• Data collection will be opt-in, encrypted, anonymized and transmit no scientific or personally identifying data to SBGrid servers.• Direct statistics collection system will allows measurement of program usage in fine detail.• Collected data will be useful to SBGrid for administration and research and to scientist software developers for funding, planning, development and to accurately assess the impact of their work.• Data will allow independent measure of program use (hours per year, per user, per file, etc.), estimations of utility and provide direct technical feedback to program developers unavailable by other means.• Analyze and disseminate direct tracking and self-reported research software usage data.• Design and deploy automated reporting tool of directly collected and self-reported usage data for program developers and users.

Supported by a grant from theNational Science Foundation

Award # 1448069

Data Mining Historical Resources• Quantitative analysis of historical use vs. citation data.• Mine PMC-OA and PDB databases for entries pertaining to 10 commonly used structural biology programs.• Extract XML and plaintext of all structure files contained in the PDB and text and supplemental materials of all PMC-OA articles using 77 unique data descriptors.• Algorithms able to discriminate PMC-OA extracts as originating from main body, reference, or supplemental materials of each article.• Quantify program use vs. citation and the fidelity of derived citation metrics.• Use occurrences of each programs in PDB will be cross-referenced with text from the PMC-OA database.• Cross-referencing permits tacking of individual program use vs. rates, fidelity and placement (body, references, supplement) of subsequent citations.• Correlation of use and citation will be further compared to data from leading citation indexers (e.g. Web-of-Knowledge, Google Scholar, Scopus, etc.) to assess the frequency of supplemental citations missing from citation indexer databases, and thus from derivative metrics (h-index, g-index, etc.).• Calculate an average “correction factor” for missing, incorrect and non-indexed citations.• Evaluation of alternative metrics for impact of scientist-created research software.