mary beth hughes, ph.d. ida science and technology policy institute

Can Traditional R&D Evaluation Methods Be Used for Evaluating High-Risk, High-Reward Research

Programs?

Mary Beth Hughes, Ph.D.IDA Science and Technology Policy Institute

American Evaluation Association Annual ConferenceAnaheim, CA

November 4, 2011

Overview

• Introduction to the Problem• A Typology of HRHR Research Programs• Applicability of Traditional RTD Evaluation

Methods to HRHR Research Programs– Example: NIH Director’s Pioneer Award

• Some Concluding Thoughts

A Call for High-Risk, High-Reward Research Programs…

…And a Desire to Understand Program Effectiveness

What are High-Risk, High-Reward Research Programs?

• HRHR research programs not well defined• Generally, HRHR research programs aim to support

unconventional, innovative, creative, transformative, “outside the box” research that will have a larger impact than status-quo research

• HRHR research programs may differ from traditional RTD programs in terms of: funding amount and duration, mechanism of funding, target recipients, target research fields, selection criteria, and selection/review processes

The Problem

• Given the differences between HRHR programs and traditional R&D programs (and the hypothesized difference between “normal” research and “HRHR” research), it is not clear that the application of traditional RTD evaluation methods to HRHR programs is valid.

What do we know about when and how traditional methods can apply to evaluations of

HRHR research programs?

Very Few Evaluations and Research on U.S. HRHR Programs*

• NIH’s Director’s Pioneer Award – IDA Science & Technology Policy

• NSF’s SGER – SRI• HHMI vs. R01 – Pierre Azoulay, MIT• NIH’s Director’s New Innovator Award– IDA

Science & Technology Policy• NSF’s Emerging Frontiers in Research and

Innovation – IDA Science & Technology Policy

*Many more studies of ‘creative’ research at individual scientist or organizational level – but no commonly agreed on indicators of creative research

Summ

ative StudiesForm

ative Studies

A Typology of HRHR Programs (Heinze, 2008; Hughes and Lal, 2009)

• People Programs– Aimed at funding an individual scientist to undertake (almost) any

research project; longer duration funding and higher funding amounts

• Synergy Programs– Aimed to move a field forward through projects based on teams or

inter- or multi-disciplinary approach• Challenge Programs

– Aimed at funding projects based on a technological challenge or critical national need; funding may be milestone-based

• Seed Programs– Aimed to jump start a project (often unconventional); shorter

duration funding and lower funding amounts

“People” Programs Present Greatest Challenge to Translation of Methods

• Synergy Programs– Emerging bibliometrics to measure interdisciplinarity– Social network analysis to measure collaborations

• Challenge Programs– Specific challenges and milestones identified a priori serve as measures of desired

outcomes – did research meet milestones or is it contributing to outcome?• Seed Programs

– Intended to incorporate new projects into traditional funding streams – did projects go on to receive funding?

• People Programs– Research projects not well defined, funding used for multiple ideas, many different

types of risks possible– Longer-term funding means need for early indicators– Concept of failure less clear

One Example: The NIH Director’s Pioneer Award

• Key Features– Managed out of NIH’s Office of the Director– 5-page application reviewed externally– Interview by external panel– Three high-level criteria (Years 2+)

• Scientific problem to be addressed• Investigator• Suitability for NDPA mechanism

– 5 years of funding, $2.5M– 51% effort commitment– Flexibility in how funds are used

• Evaluation Request– NDPA represented several “firsts” for NIH– Viewed as an “experiment” in how to fund biomedical and behavioral research– IDA/STPI was asked to evaluate short term outcomes of first 22 awardees

NDPA Evaluation Used Common RTD Evaluation Methods

1. Bibliometrics2. Case Studies - descriptive3. Expert Review

These methods are all ex-post but our understanding around ex-ante evaluations of HRHR research (e.g. proposals) is also

weak.How well does traditional peer review apply to HRHR research?

How successful are various alternatives (shortened applications, interviews, “sandpit process”, etc.) at identifying

HRHR research?Many opportunities for further study…

1. Bibliometrics for Traditional RTD Evaluations

• Description of Method– Some examples of bibliometrics• Traditional: Counts (and variations thereof), Citations

(and variants therof), Content Analysis• Emerging: Interdisciplinarity, Burstness, Centrality

• Uses in traditional RTD evaluations– Counts used as measures of productivity– Citations used as measures of utility and

dissemination

Bibliometrics for HRHR Research Evaluation

• Prior Work– Many studies proposing relationships between bibliometric-based indicators and

creative research outcomes (Productivity (Simonton, 2004); Interdisciplinarity (Heinze, 2007); Brokerage (Burt, 1992); Burstness and Centrality (Chen, 2009))

– Azoulay (2010) found HHMI researchers had higher level of productivity both post-award and compared to control group (and higher levels of highly cited publications)

• NDPA Findings– No apparent correlation between measures in literature and expert review of research– Interpretation unclear – short-term nature of evaluation? Small number statistics?– Counts cannot distinguish between concept of a researcher trying something new and

failing and an unproductive scientist

• Conclusion for Use of Method for Evaluation of “People” Programs– Currently insufficient as sole method; further work needed to understand HRHR-specific

bibliometric indicators (e.g. Transformative outcomes may not be readily accepted by scientific tradition (Polanyi, 1966))

Bibliometrics: Initial Comparisons Across HRHR Evaluations

NDPA Data (Source: STPI) HHMI Data (Source: Azoulay, 2010)

Bibliometrics show potential for cross-evaluation comparisons and synthesis of HRHR evaluations

2. Case Studies for Traditional RTD Evaluations

• Description of Method– “In-depth investigations into a program, project, facility, or

phenomenon, usually to examine what happened, to describe the context in which it happened, to explore how and why, and to consider what would have happened otherwise.” (Yin, 1994)

• Uses in traditional RTD evaluations– Typically used in exploratory phases of a program (Ruegg

and Feller, 2003)– Helpful for understanding key relationships and variables in

a complex phenomenon (Shadish, Cook, and Leviton, 1991)

Case Studies for HRHR Research Evaluations

• Prior Work– No HRHR research program evaluations relying on case studies; literature on

creative research includes case studies on individual scientists OR overall work environment but less understanding at level of groups or projects

• NDPA Findings– Case studies allowed for tracking of research trajectory and gave awardees

opportunity to state what made research pioneering– Great diversity across awardees in terms of use of funding, other funding,

research approach, research trajectory, group size, group composition no consistent markers of pioneering research

• Conclusion for Use of Method for Evaluation of “People” Programs– Most suitable method for understanding HRHR research trajectory, especially

in the near-term and for small samples– But need common framework for understanding what information to collect

to enable better understanding of HRHR research (sociology of science?)

3. Expert Review for Traditional RTD Evaluations

• Description of Method– Using informed judgments to make assessments– Many variants on implementation of review; effects of

review process variants on review outcomes not well-understood

• Uses in traditional RTD evaluations– Most widely used method for research evaluations– Recognized challenges, but often viewed as most

robust method (Garfield, 2006; NAS, 1999; Nature, 2009)

Expert Review for HRHR Research Evaluations

• Prior Work– Experts widely used for evaluating “high rewards” (e.g. Nobel Prize)– Little research done on how experts identify HRHR research, although Amabile

(1982) suggests a technique for determining creative outcomes• NDPA Findings

– Experts in the field of the awardee independently evaluated 3 publications and the case study of the awardee and asked to determine the level of “pioneeringness”

– When asked how experts made determination, most included some variant of “you know it when you see it”…but there was still disagreement between experts (consistent with non-HRHR research evaluation findings)

• Conclusion for Use of Method for Evaluation of “People” Programs– Currently necessary for understanding scientific contributions– Need better understanding of what experts are looking for in their assessment– Need robust data collection to enable cross-evaluation comparisons

Some Concluding Thoughts

• To return to original question…can these methods be applied to HRHR research programs?– Sort of, BUT we need a clearer understanding of program

theory of HRHR programs, need to tailor methods appropriately, and need to use multiple methods

• Need more evaluations of HRHR research programs for synthesis of evaluation results– Studies are in progress (EURECIA, CREA, STPI, …)

• As evaluators, need to balance role as auditors and researchers and caveat findings appropriately

Thank You!

• Acknowledgments– Bhavya Lal– Stephanie Shipp– Elizabeth Lee– Amy Marshall– Brian Zuckerman

• [email protected]

NDPA FY 2004 – 2005 Outcome Evaluation report available at:https://commonfund.nih.gov/pdf/Pioneer_Award_Outcome%20Evaluation_FY2004-2005.pdf

https://commonfund.nih.gov/pdf/Pioneer_Award_Outcome%20Evaluation_FY2004-2005.pdf

mary beth hughes, ph.d. ida science and technology policy institute

Documents

traditional research

highreward research

normal research

box research

hrhr research programsexample

traditional rtd programs

funding projects

traditional rd programs