weak information work in scientific discovery

13
Weak information work in scientific discovery Carole L. Palmer * , Melissa H. Cragin 1 , Timothy P. Hogan 2 Graduate School of Library and Information Science, University of Illinois at Urbana–Champaign, 501 E. Daniel Street, Champaign, IL 61820-6212, United States Received 27 October 2005; received in revised form 11 June 2006; accepted 16 June 2006 Available online 8 September 2006 Abstract Scientists continually work with information to move their research projects forward, but the activities involved in find- ing and using information and their impact on discovery are poorly understood. In the Information and Discovery in Neu- roscience (IDN) project we investigated the information work involved as researchers make progress and confront problems in the practice of brain research. Through case studies of recent neuroscience projects, we found that the most difficult and time-consuming information activities had parallels with Simon’s explication of weak methods in scientific problem solving. But, while Simon’s weak/strong distinction is an effective device for interpreting information work, his general conception of how discovery takes place is artificially constrained. We present cross-case and case-based results from the IDN project to illustrate how the conditions of problem solving Simon associated with weak methods relate to information work and to identify additional weak aspects of the research process not considered by Simon. Our analysis both extends Simon’s framework of what constitutes the discovery process and further elaborates how weak approaches influence the conduct of research. Ó 2006 Elsevier Ltd. All rights reserved. Keywords: Scientific discovery; Information practices; Information seeking; Neuroscience; Research processes 1. Introduction Information is an essential resource in the process of scientific discovery, and scientists are continually work- ing to gather information from the literature, databases, web resources, and colleagues. In turn, they evaluate, collect, manage, consult, integrate, and apply that information to move research forward. This ‘‘information work’’ has never been assessed on the large scale in terms of time spent or impact on the advancement of science. But, its importance is evident in the number of scientific researchers and information scientists striving to find better ways to mobilize and work with the ever growing body of information resources. In the Information and 0306-4573/$ - see front matter Ó 2006 Elsevier Ltd. All rights reserved. doi:10.1016/j.ipm.2006.06.003 * Corresponding author. Tel.: +1 217 244 0653; fax: +1 217 244 3302. E-mail addresses: [email protected] (C.L. Palmer), [email protected] (M.H. Cragin), [email protected] (T.P. Hogan). 1 Tel.: +1 217 244 8729; fax: +1 217 244 3302. 2 Tel.: +1 217 333 3280; fax: +1 217 244 3302. Information Processing and Management 43 (2007) 808–820 www.elsevier.com/locate/infoproman

Upload: carole-l-palmer

Post on 05-Sep-2016

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Weak information work in scientific discovery

Information Processing and Management 43 (2007) 808–820

www.elsevier.com/locate/infoproman

Weak information work in scientific discovery

Carole L. Palmer *, Melissa H. Cragin 1, Timothy P. Hogan 2

Graduate School of Library and Information Science, University of Illinois at Urbana–Champaign,

501 E. Daniel Street, Champaign, IL 61820-6212, United States

Received 27 October 2005; received in revised form 11 June 2006; accepted 16 June 2006Available online 8 September 2006

Abstract

Scientists continually work with information to move their research projects forward, but the activities involved in find-ing and using information and their impact on discovery are poorly understood. In the Information and Discovery in Neu-roscience (IDN) project we investigated the information work involved as researchers make progress and confrontproblems in the practice of brain research. Through case studies of recent neuroscience projects, we found that the mostdifficult and time-consuming information activities had parallels with Simon’s explication of weak methods in scientificproblem solving. But, while Simon’s weak/strong distinction is an effective device for interpreting information work, hisgeneral conception of how discovery takes place is artificially constrained. We present cross-case and case-based resultsfrom the IDN project to illustrate how the conditions of problem solving Simon associated with weak methods relateto information work and to identify additional weak aspects of the research process not considered by Simon. Our analysisboth extends Simon’s framework of what constitutes the discovery process and further elaborates how weak approachesinfluence the conduct of research.� 2006 Elsevier Ltd. All rights reserved.

Keywords: Scientific discovery; Information practices; Information seeking; Neuroscience; Research processes

1. Introduction

Information is an essential resource in the process of scientific discovery, and scientists are continually work-ing to gather information from the literature, databases, web resources, and colleagues. In turn, they evaluate,collect, manage, consult, integrate, and apply that information to move research forward. This ‘‘informationwork’’ has never been assessed on the large scale in terms of time spent or impact on the advancement of science.But, its importance is evident in the number of scientific researchers and information scientists striving to findbetter ways to mobilize and work with the ever growing body of information resources. In the Information and

0306-4573/$ - see front matter � 2006 Elsevier Ltd. All rights reserved.

doi:10.1016/j.ipm.2006.06.003

* Corresponding author. Tel.: +1 217 244 0653; fax: +1 217 244 3302.E-mail addresses: [email protected] (C.L. Palmer), [email protected] (M.H. Cragin), [email protected] (T.P. Hogan).

1 Tel.: +1 217 244 8729; fax: +1 217 244 3302.2 Tel.: +1 217 333 3280; fax: +1 217 244 3302.

Page 2: Weak information work in scientific discovery

C.L. Palmer et al. / Information Processing and Management 43 (2007) 808–820 809

Discovery in Neuroscience (IDN) project we investigated the information work involved in the practice of brainresearch (Palmer, Cragin, & Hogan, 2004). We found that the most difficult and time-consuming activitiesparalleled Simon’s explication of weak methods in scientific problem solving.

As discussed by Simon, Langley, and Bradshaw (1981), weak problem solving is associated with specificresearch conditions, including an ill-structured problem space, unclear or unsystematic steps, and a lack ofprior domain knowledge. Conversely, strong problem solving is applied when a research problem is welldefined, and it tends to proceed through systematic, routine activities and with a high level of domain knowl-edge. Strong, expert methods are practiced in what Kuhn (1962) referred to as normal science. Revolutionary,paradigm-altering science, on the other hand, advances with methods that have not been refined for the givenapplication. While ‘‘crude and cumbersome’’ these weak approaches are not a second-best choice for solvingresearch problems but rather ‘‘may be the only ones at hand on the frontiers of knowledge, where few relevantspecial techniques are yet available’’ (Langley, Simon, Bradshaw, & Zytkow, 1987).

In our analysis of information work in the practice of brain research, Simon’s distinction between weak andstrong approaches proved to be an effective device for interpreting the activities involved in finding and usinginformation. However, we found Simon’s more general ideas about the process of discovery to be artificiallyconstrained. Thus, our results extend Simon’s conception of what constitutes the discovery process while fur-ther elaborating how weak approaches influence the conduct of research. In this paper, we begin by introduc-ing the literature on scientific discovery and problem solving that informed our analysis and by describing ourcase study methods. Based on our cross-case analysis, we discuss our conception of weak information work(WIW), the prominence of WIW in certain stages of research, and its role in specific modes of informationseeking. Two short case studies are presented to provide a more detailed illustration of how the conditionsSimon associated with weak methods extend to information work. We conclude by arguing that understand-ing the dynamics of weak and strong information work is important for determining how information systemsand services can make the greatest contribution to the discovery process.

2. Background

2.1. Conceptions of discovery

The mechanisms of scientific discovery have been characterized from different scholarly perspectives. Prac-ticing scientists have written about the process of discovery to raise the awareness of others involved in thescientific enterprise and the interested public (e.g., Root-Bernstein, 1989). Historical accounts are most numer-ous, with many authors concentrating on the complex and esoteric nature of science or particular high-profileevents (e.g., Bernal, 1953; Harwit, 1981; Holton, 1973). Kuhn’s (1962) influential book distinguished revolu-tionary science from ‘‘normal’’ science, providing a more socially based interpretation than many earlierworks. He situated significant research advances in the context of the larger landscape of activities that buildand sustain scientific paradigms and disciplines.

Few information scientists have conducted empirical investigations of the discovery process for applicationto information service and system development, although Bawden’s (1986) discussion of the connectionbetween creativity and information strategies is worthy of note. He recommended certain information technol-ogy features for enhancing creative problem solving and discovery, such as access to peripheral material andexplicit representation of analogies, patterns, and exceptions. Similarly, Martyn (1974) argued that the infor-mation needed to formulate and solve problems often lies outside of the core material that supports profes-sional competencies and may not appear immediately relevant.

Cognitive science has contributed much to our understanding of scientific thinking. Dunbar’s work (e.g.,Dunbar, 1993) in particular has interesting implications for information systems. For example, his finding thatthe setting of goals impacts the discovery of new concepts suggests that particular kinds of information couldassist scientists in reworking goals as new evidence or inconsistent findings emerge. Other scholars have rec-ognized the importance of information in the discovery process. Newell (1969) identified information acqui-sition as an important but strictly cognitive component of discovery. Fujimura (1987) accounted for certaintypes of information gathering and exchange in articulation work—the collecting, coordinating, and integrat-ing tasks that make research projects ‘‘doable’’.

Page 3: Weak information work in scientific discovery

810 C.L. Palmer et al. / Information Processing and Management 43 (2007) 808–820

Within the body of research on computational scientific discovery (e.g., Darden, 1997; Valdes-Perez, 1999),the work of Simon et al. introduced in Section 1 has important implications for how information activities fuelthe research process. Their conceptualization of strong and weak scientific approaches suggests that certainkinds of information activities can make high impact contributions to discovery. Our study confirms this idea,not only showing that WIW is influential but that it is difficult to carry out and not well supported by currentinformation systems and services. At the same time, Simon’s framework for understanding discovery does notaccount for important segments of the research process.

2.2. Information and the discovery process

All of the scholars discussed in 2.1 have associated information with discovery, but none has offered agood representation of the variety and necessity of the work of finding and using information. de Jong andRip’s (1997) discussion of the future of computer-supported discovery environments (CSDEs) in the practiceof science is a notable exception. Using a fictional scenario they illustrate how the process of discovery unfoldsas scientists work through inter-related questions, each of which requires a number of information-basedactivities before the project can move forward. The central role of information in the discovery process is evi-dent as they show, for example, how ‘‘in many situations the problem space is not readily available, but hasto be constructed from the heterogeneous set of (electronic) resources which scientists have at their disposal’’(p. 237).

While de Jong and Rip’s ideal scenario succeeds at showing the socio-technical dimensions of the designand use of CSDEs, they offer a truncated picture of the research process. For instance, in the real projectsdocumented in our study, research almost always took a much more circuitous path and of course took longerthan the three days depicted in their sequence of events. Their representation skips over the preliminary yetcritical activities of defining an emerging research problem, becoming familiar with associated intellectualdomains, identifying and evaluating possible paths of investigation, as well as the ongoing work of interactingwith collaborators, colleagues, and competitors.

Simon and his colleagues, like de Jong and Rip, also gave little attention to processes or activities outsidethe data collection and analysis stages of inquiry. They briefly note that conducting science is different fromother kinds of problem solving because research is a social process that often involves many scientists and pro-ceeds over an extended period of time. Nonetheless, they assert that in spite of such differences ‘‘the compo-nent processes, which when assembled make the mosaic of scientific discovery’’ have ‘‘no special properties’’that distinguish them from other problem solving situations (Simon et al., 1981, p. 2). Instead these tasks,many of which involve finding, managing, and using information, are covered by their concept of ‘‘meta-activ-ities’’. Meta-activities can play a seminal role in discovery, as Simon et al. (1981) point to with the example ofhow ‘‘Mendeleev discovered the periodic table while planning the arrangement of topics for an elementarychemistry textbook’’ (p. 4). However, the only other meta-activities they explicitly identify are writing and dis-seminating research results. The ‘‘mosaic’’ they invoke does not cover substantial parts of the research process,especially the activities that precede data collection and analysis.

Whether information work is thought of as a meta-activity (Simon et al., 1981) or as a fundamental part ofdiscovery (de Jong & Rip, 1997), without it little progress would be made in scientific research. Our cases showthat it is a consequential and distinct part of the mosaic of discovery that surrounds, connects, and fuels thedata and analysis activities emphasized by Simon. Moreover, our results indicate that, like the heuristics ofproblem solving or the techniques applied in revolutionary vs. normal science, information work processescan also be weak or strong.

2.3. Weak and strong conditions

As discussed in Section 1, Simon et al. (Langley et al., 1987; Simon, 1986; Simon et al., 1981) associatedweak scientific methods with certain problem solving conditions, such as an ill-structured problem space,unclear or unsystematic steps, and limited prior domain knowledge. They also explain that weak approachesoperate with less information and that a particular weak method may be applicable to many different tasks ordomains. Weak approaches tend to be used by novices and for solving problems in novel domains. The novice

Page 4: Weak information work in scientific discovery

Weak Elements

Ill-structured problem space Low domain knowledge Unsystematic steps Wide application Data driven Seek and search

Strong Elements

Structured problem spaceHigh domain knowledge

Systematic stepsTask specific

Theory drivenRecognize and calculate

Fig. 1. Continuum of conditions Simon et al. associated with strong and weak scientific approaches.

C.L. Palmer et al. / Information Processing and Management 43 (2007) 808–820 811

problem solver is driven by data, and they search and test to figure out what to do next. While weak methodsare less powerful than strong methods, they are often the only ones available on the fronts of science wherespecialized techniques have not yet been developed. Cutting edge, transformative science proceeds throughweak methods by necessity.

In contrast, strong approaches tend to be driven by existing theory and ‘‘truths’’. They proceed with clearand calculated steps and depend on expert domain knowledge. Since they are more specific in their applica-tion, they are not easily extended to new tasks or domains. But, strong approaches are powerful and oftenallow solutions to be found with little or no search. Instead of seeking and searching, the expert problem sol-ver tends to recognize and calculate (Langley et al., 1987; Simon, 1986; Simon et al., 1981).

Fig. 1 represents a summary of the conditions Simon and his colleagues associated with weak and strongapproaches. Note that these conditions describe aspects of the scientific problem under investigation as well asthe knowledge and practices of individual scientists. In our analysis we found that these conditions can beapplied to different levels of information work. They influence specific information searching techniques suchas browsing, as well as more long-term processes such as exploring how to design a new experimental proce-dure. Thus we use strong and weak to describe situations that involve isolated and routine activities, tech-niques, approaches, and processes. We use ‘‘information work’’ as a general term to refer to informationpractices at any of these levels of granularity.

3. Methods

To investigate the information work involved in brain research, the IDN project team developed case stud-ies of neuroscience projects at four laboratories located at three research universities across the country. Thefocus of the data collection was initially on specific instances when researchers made progress or confrontedproblems in the course of research. These incidents served a dual role. They anchored the study to current,significant research activities, and they provided a point of entrance into the larger research projects thatdefined the parameters of a case. Additional project cases were identified as we worked with participantsand became more firmly embedded at the research sites. The results presented here are based on data collectedprimarily during the first half of the project in 2003–2004. At present, data collection has concluded on all buttwo cases and analysis is ongoing.

3.1. Sample and body of data

We enrolled a total of 25 participants in the project, 11 of which were key informants selected because theywere leading ongoing research projects. The remaining 14 participants were other senior and junior biologicaland computer scientists, postdoctoral researchers, graduate students, and laboratory technicians and manag-ers who played important roles in the case projects. The participants represented four distinctly different lab-oratories. One laboratory is a small group that does behavioral and neuronal research on learning andmemory. The second is a larger operation with a number of research scientists working on brain imagingrelated to psychiatric disorders such as schizophrenia. The third lab is a large interdisciplinary biology centerinvolved in informatics development. The fourth is run by a single investigator but is involved in several inter-disciplinary projects concerning bioinformatics and neurologic diseases.

Page 5: Weak information work in scientific discovery

812 C.L. Palmer et al. / Information Processing and Management 43 (2007) 808–820

Some of our participants (32%) were drawn from a group of field testers for the Arrowsmith Project whichis developing a data mining tool that searches MEDLINE for complementary but disconnected literatures(Smalheiser, 2005; Swanson & Smalheiser, 1999). The Arrowsmith team has been an important partner ingaining access to neuroscientists working in a variety of brain research specializations. The field testers’ liter-ature searches were often used as points of entrance into the individual case projects and for learning aboutrelated research going on at their laboratories.

3.2. Case development

Thirty-eight projects were tracked of which eight were developed into full case studies. Case data includeface-to-face and telephone interviews, search diary records, observation field notes, and project documents.These largely qualitative data generated rich, local information on how scientific work is carried out withinthe framework of a specific project (Fujimura, 1987; Vaughan, 1992). Data from the remaining thirty projectsprovided additional context and details on related research being conducted in the laboratories. We docu-mented the research process at a level that brings important aspects of information work into view, such asthe things that make information important, useful, or difficult to find or use, and the social linkages by whichinformation and knowledge move and coalesce.

Interviews: We conducted a total of 71 semi-structured interviews. Key informants were interviewed multi-ple times, with each session increasing in focus and specificity over time. Typically, we conducted 60–90 minsessions with the first one concentrating on the scientist’s projects, their specific interests and responsibilities,the larger laboratory context and related projects, and basic information seeking and use practices. We thenused this background information to inform the next interview, where we probed for project details and spe-cific information incidents, followed up on search diary records, and identified other project participants to beinterviewed for a given case. Subsequent interviews followed the trajectory of the projects and relatedincidents.

Search diary: The field testers used an electronic log developed by our collaborators on the ArrowsmithProject to document literature searches and other types of information activities. The log contained two forms,an Arrowsmith Diary for searches using the literature mining tool and an Information Activity Diary torecord other kinds of information seeking. Data from the diary entries were coded and analyzed to identifypatterns in anticipated and emergent categories. The analysis was validated through an intercoder reliabilityprocess in which the three project team members developed consensus on each category and its application tothe data. For those diary entries that did not fall clearly into a category, we consulted with the field tester toconfirm or correct our classification. The importance rankings reported in Section 4.3 were specified by thefield tester as part of their diary entry.

The diary also played an important role in identifying critical incidents for interviewing and adding detail tothe more descriptive interview and observation data. Analysis of the diary entries required understanding theparticipants’ research areas and current projects, therefore we often returned to our background interviewtranscripts when coding the diary entries and regularly verified coding decisions in later interviews with theresearchers.

Observation: A total of approximately twenty hours of observation was conducted at the laboratory sites,primarily with key informants. These data gave us a broader view of the information activities, resources, andpersonnel associated with the case projects. In addition to recording field notes on day-to-day bench work, wealso had the opportunity to photograph legacy computer systems and other experimental apparatus, revieworganizational charts, and look through microscopes.

Project documents: Materials collected for content analysis included lab notes, experiment documentation,and reports, proposals, and publications used or produced by the scientists in the projects being studied. Fromthese sources we will be extracting information about the people and literature referenced by the scientists aswell as additional evidence of research progress and information interactions.

Case files consist of transcribed verbatim and descriptive texts of interviews and observations, coded diaryentries, and document data. Open coding of case file data was followed by more refined axial coding, and theoverall process of coding proceeded through iterative, comparative analysis (Strauss & Corbin, 1998). Eachtranscript received two to three rounds of descriptive and thematic coding using NVivo, a software package

Page 6: Weak information work in scientific discovery

C.L. Palmer et al. / Information Processing and Management 43 (2007) 808–820 813

designed for qualitative analysis and theory building. As is typical of this type of qualitative approach, thegoal of analysis was to reveal new insights, not produce comprehensive or widely generalizable results (Becker,1998; Glaser & Strauss, 1967). Individual cases were analyzed longitudinally to capture progress and changesin research work, and comparative analysis across cases was conducted to identify commonalities and differ-ences in information practices among the different research teams.

4. Analysis

4.1. Weak and strong information work in practice

Our approach in analysis has been to understand information work in the discovery process by focusing onadvances that take place in a research project, as well as instances where progress is deterred. As Gerson(2002) notes, to understand discovery we need to analyze the conditions under which effective intersectionsform and the circumstances that block or retard fruitful intersections. In building these intersections, theresearchers we studied explored and gathered diverse information for many purposes—to assess their ownwork and that of others, to integrate background knowledge, to solve short-term instrumental problems, toconsult and talk shop, and to sort out complex intellectual relationships among the vast body of neurosciencefindings. The handling and processing of information is part of the task structure of every kind of work. Allwork ‘‘involves some kind of information production/construction/consumption/use’’ (Gerson as cited inStrauss, Fagerhaugh, Suczek, & Wiener, 1985, p. 253).

In general, the information work we documented was not inherently weak or strong. Instead it tended to beinfluenced by the conditions listed in Fig. 1 which are related to the scientific problem under investigation andthe knowledge and activities associated with individual scientists. And, as we will see in 4.2, weak and strongwork were aligned with certain stages of the research process. A few types of activities, however, were found tobe generally weak or strong across problems, scientists, and stages of research.

Footnote chasing is one example of a consistently strong practice. Most researchers followed references inthe literature for various purposes within the course of research. While footnote chasing was common in theearly stages of a project, we also documented numerous instances of the activity later in the research process,for example when a researcher needed to relate new findings to a larger or different body of knowledge orlocate specifics on a person, lab, or technique used elsewhere. Relating Simon’s conditions in Fig. 1, footnotechasing follows a clear, structured path of bibliographic references. The searcher recognizes items of interestand calculates their potential relevance and the next step in the chaining process. At times researchers maypursue leads into literature where they have limited domain knowledge, but this does not necessarily deterthem from the task at hand of identifying potentially relevant literature. In contrast, browsing in a largeset of texts or a bibliographic database is a weaker literature searching technique, since the path or next stepsare not always obvious and a lack of domain knowledge or terminology is more likely to inhibit one’s abilityto find relevant materials. Data and literature mining techniques are weaker yet, as they depend on data-driven search and are often conducted with an ill-defined problem focus.

Strong approaches were commonplace in experimental work. Researchers frequently searched for protocolsand instrumentation information from standard or locally established sources. The problems encounteredtended to be tightly constrained, domain knowledge was usually high, and the steps to be taken were relativelyroutine. There were also many strong processes at work in what Simon would consider the meta-activity levelof discovery. For example, strong information work was used to rebuild expertise when a scientist had notbeen active in a core research area for a period of time and needed to do remedial reading and ‘‘retooling’’to catch up. That kind of work is strong as long as the scientist is highly aware of what they need to brushup on and how to go about it. Similarly, strong information work also tends to be used for ‘‘core mainte-nance,’’ a strategy used to sustain a firm position in a disciplinary specialization (Palmer, 1999, 2001). Scien-tists or research teams will maintain productivity through systematic studies in their established core whileselectively targeting new, more high-risk opportunities in new areas. For core maintenance, researchers usestrong techniques and the well developed expertise in their specialization. The new work at the high-riskresearch fronts involves much weaker activities of building strategic alliances and collaborations, exploringnew domains, and testing new ideas and techniques.

Page 7: Weak information work in scientific discovery

814 C.L. Palmer et al. / Information Processing and Management 43 (2007) 808–820

4.2. Stages of research and WIW

Four stages of research were evident in the case projects we followed: preparation, data collection, analysis,and dissemination. The heaviest concentration of WIW was in the preparation stage. We also documented amoderate level of WIW in the analysis stage, with considerably less in dissemination, and only rare occur-rences in the experiment stage where most problem solving relates to techniques and tools.

Information work in the preparation stage generally preceded the other stages, laying the foundation forfuture discovery. However, some preparation activities were performed in tandem with other stages. Currentawareness reading is one example. It tends to be an ongoing although not an altogether routine or strong activ-ity. Strong information work was prevalent in preparing or setting a base for discovery—in the work of keepinga laboratory up and running, improving its technical and intellectual abilities, and retaining a research teamand other collaborators. But in the preparation stage there were many weaker activities that included determin-ing the feasibility and potential impact of a project, assessing initial hypotheses relative to the existing body ofresearch, enrolling new collaborators, and developing speculative yet convincing funding proposals.

The WIW in the analysis stage tended to be more intermittent. Weak approaches came into play when datainterpretation was not straight forward, there were unexpected findings, or it was determined that results mightbe applied in new ways or extended to a new domain. For instance, in one case, before an investigator coulddetermine the meaning of the recent findings produced by his lab, he needed to first explore other possible expla-nations for the results, including that they may be observing an artifact of imprecise measurement. In anotherinstance, an author had to re-assess data in light of more recently published research from another laboratory.

In the data collection and dissemination stages, information work tended to be stronger because the prob-lems at hand were more structured, the steps to be taken were more routine, and domain knowledge was gen-erally high. For instance, the work of building influence through publication and other forms of scholarlycommunication proceeded fairly systematically. But, as researchers moved out of their core knowledge baseand familiar intellectual and social structures, seemingly strong approaches became much weaker and addi-tional information needs were introduced. For instance, if the results of an experiment have broader implica-tions than originally thought the dissemination process becomes less systematic. The literature in outsidedomains may need to be consulted, which may require deciphering unfamiliar terminology and additionalreading for background and context. In such cases, the information gathered from far afield will need tobe weighed, evaluated, and confirmed, and experts in other fields may need to be consulted. To make furtherprogress, new partnerships may need to be initiated, assessed, and nurtured.

4.3. Weak searching

The Arrowsmith testing conducted by some of our participants offered a unique opportunity to examineweak information searching activities. A data mining tool by design, Arrowsmith was conceived as a systemto support weak, data-driven approaches to literature searching. It allows searchers to find links among dis-connected literatures or areas of research (Smalheiser & Swanson, 1994, 1996, 1998). The field testers wereencouraged by the Arrowsmith team to perform searches to test hypotheses and new ideas in the literature,and they received training to improve their searching abilities with the system and with MEDLINE more gen-erally. They worked cooperatively with us by recording these kinds of search activities and other types of data-base and Internet searches on a regular basis. Table 1 presents a typology of reported searches based onmotivating situation or impetus for the search. Using the procedures described in Section 3.2, we identifiedeight primary categories and nine subcategories in a total of 139 diary entries. The typology in Fig. 1 wasdeveloped prior to our WIW analysis and was first presented in Palmer et al. (2004).

As might be expected, weak conditions were associated with the searches assigned to categories A, C, andF, where researchers were testing new ideas or searching for information outside of their primary researchdomain. These searches generally required more time and effort than those in other categories, since scientistsoften began without knowing the scope of what they were looking for and then needed to undertake multipleiterations of searching and assessing to make progress. There were also interesting instances of weak searchingwithin category B. Most of these situations involved searching literature databases or the Web, often with lim-ited subject knowledge, but may also have included scanning a set of journals or a textbook for a lead.

Page 8: Weak information work in scientific discovery

Table 1Information search typology

A. Assessing hypothesis:Own preliminary hypothesisEstablished hypothesis

B. Assessing local finding relative to literature

C. Searching for specific information outside domain

D. Searching deeply in literature of own domain

E. Exploring literature in own domain:General literature reviewsCurrent awareness efforts

F. Exploring outside domain

G. Problem-solving:Local methods and instrumentation problemsIntellectual problems (specific questions, fact finding)

H. Known-item searching:Footnote chasingKnown-item searchKnown person/author search

3

11

29

11

23 25

2

19

0

20

40

60

80

100

120

C Specific searchoutside domain

G Specific

questions andfact finding

B Assessing

finding

H Known-itemsearching

D Deep searchown domain

F Exploring

outside domain

A Assessinghypothesis

E Exploring own

domain

Categories with importance rankings

Imp

ort

ance

ran

kin

g

(%)

Categories

Percent rankedPotentially orDefinitely Important

Fig. 2. Importance of information resulting from searching activities.

C.L. Palmer et al. / Information Processing and Management 43 (2007) 808–820 815

The frequency of hypothesis assessment (category A) and specific out of domain searching (category C) waslower than expected considering the ease of access and encouragement to use Arrowsmith for this purpose.However, the specific out of domain searches that were performed were considered by the field testers tobe very important to the research process. As illustrated in Fig. 2, only three instances of category C searchingwere documented, but each one was ranked as highly important. Interestingly, the next highest ranked cate-gory in terms of importance was a strong practice—G, instrumental problem solving and fact finding.

4.4. WIW in context

In this section we present two excerpted cases to further elaborate the nature of WIW processes. They bothfocus on the preparation stage of research, but each represents a very different contribution to the discoveryprocess. The first case is about testing a new idea, where a junior researcher is a novice in a prospectiveresearch area and is highly dependent on a more senior collaborating partner. The second case is about the

Page 9: Weak information work in scientific discovery

816 C.L. Palmer et al. / Information Processing and Management 43 (2007) 808–820

development of a new technical procedure that needs to be implemented before an entire area of research canadvance. The two cases demonstrate some fairly typical as well as some less intuitive aspects of WIW. Theyalso help to expand Simon’s conception of the components of the discovery process by showing in some detailthe range and necessity of the research work that precedes the actual implementation, or experimental part, ofa research project.

4.4.1. Case 1: Testing an idea

This case involved a post-doctoral researcher who happened upon a unique opportunity to design and per-form a study outside his domain of expertise. The idea for the study emerged from a particular WIW activity,and assessing the feasibility and value of embarking on the study required further WIW.

The researcher was part of a laboratory dedicated to studying the neural substrate of learning and memory.He was well-versed in electrophysiology and behavioral methods and his experiments often involved recordingneuronal activity in small mammals. One of his long-time interests was theta rhythm, rhythmic activity in thebrain thought to be important for the control of voluntary movement and sensory processes. Along withinvestigations of this kind, he was field testing the Arrowsmith literature mining system to assess the tool’susefulness in his daily research activities. In one of his Arrowsmith searches he used the query terms ‘‘thetarhythm’’ and ‘‘neurogenesis’’. He characterized this as an undirected exploratory activity and performedthe search with various interesting keywords: ‘‘There’s a keyword list that’s published by the Society for Neu-roscience. I looked down the list to see things that might be related to theta rhythm, that people haven’t reallythought of before. [I] just typed them in.’’ [P1A3, June 9, 2003].

While reviewing the abstracts that resulted from this search, he began thinking about the particular types ofexperiments he could do to explore theta rhythm and neurogenesis. He explained that this thinking helped himstumble across a somewhat different idea: ‘‘One way to do this study is to anesthetize the animals and inducetheta by giving them certain drugs. And the type of theta that is induced by doing that procedure is virtuallythe same brain profile that occurs during REM sleep, rapid eye movement sleep, which is dream state sleep.And that was interesting . . . that’s an interesting idea.’’ [P1A3, June 9, 2003].

The researcher found himself faced with both challenges and opportunities. Although he was familiar withthe literature on theta rhythm, the new project he envisioned was sleep-related, a research domain with whichhe was much less familiar. Normally he did not read publications about sleep research, and he had little senseof the work currently going on in the field. Also, he only had the resources to do the study in a crude fashionwith limited controls. However, a more senior neuroscientist he knew elsewhere through his association withthe Arrowsmith project had former collaborators who had expertise in different, more advanced techniques.With their help it would be possible to generate cleaner, more reliable data. The senior colleague offered toassist him by making contact with the other potential collaborators to help him set up the equipment necessaryfor the study and to work with him to further brainstorm on how to conduct the study.

Over the next three months, the researcher engaged in a variety of information work activities in an effort tobetter grasp the feasibility and value of a study about REM sleep and neurogenesis. He spoke with additionalscientists at a nearby laboratory, performed new Arrowsmith searches, and began to explore the sleep-relatedliterature, reading about the pros and cons of different techniques he could use to block sleep in the study.Throughout the course of these activities, he regularly emailed status reports, search results, and questionshe was wrestling with to the senior associate who responded in a steady stream of email messages.

For the post-doctoral researcher, assessing the feasibility of a REM sleep and neurogenesis study meantexploring a new research domain and reading about and assessing techniques of which he had little knowledge.He had an area of interest and a somewhat clear research question. However his ideas about an experimentaldesign were ill-structured and undeveloped, and he had no entree points or direct links to the sleep researchcommunity. Both of these situations required WIW, which progressed in a trial and error manner with theliterature and depended largely on confirmation and leads from his associate. Fig. 3 is an estimation of theposition of Simon’s conditions on a weak/strong continuum for this case.

4.4.2. Case 2: Developing a research procedure

Magnetic Resonance (MR) imaging researchers are facing a problem with anonymity and the HIPAA(Health Insurance Portability and Accountability Act). It is conceivable that someone could reconstruct brain

Page 10: Weak information work in scientific discovery

Weak Strong

Unsystematic steps Low domain knowledge Seek and search Ill-structured problem

Fig. 3. Simon’s conditions on a weak/strong continuum for Case 1.

C.L. Palmer et al. / Information Processing and Management 43 (2007) 808–820 817

images in such a way as to render the ‘‘imaged’’ person recognizable. Institutional Review Boards (IRBs) areworking harder to protect human subjects’ anonymity at a time when imaging researchers want to increase thepractice of sharing data to support larger and more complex studies. In addition, there is at least one neuro-science journal that requires deposit of image data into a public repository as a condition of publishing in thatjournal. Therefore, face recognition is a current and critical problem that is being studied in several ways byMR researchers and other scientists. One potential solution has focused on a process called de-facing in whicha computer algorithm can trim away certain features (i.e. tissue) from brain scan images.

One of our participants in the IDN project is a computational modeling and MR research scientist who hasbeen working on the testing of one de-facing algorithm. Notably, she has had a leading role in this project eventhough the work was outside her domain of expertise. Her research team had to conduct a statistical assessmentof the de-faced images to test the efficacy of the algorithm and also test a group of study subjects to see whetherthe de-faced images could be recognized as people they knew. A range of issues arose in the course of this study,including a complicated IRB application process and difficulties managing data and writing up the results.

There were two questions driving the work: ‘‘What features of a face and head are needed to make it rec-ognizable?’’ And, ‘‘How do we quantify that an image has been properly defaced?’’ The first, more conceptualproblem necessitated the most varied information work. The researcher had to determine how best to thinkabout the phenomenon of recognition, understand how it fits with previous research, and develop the behav-ioral paradigm for conducting the experiment. These problems led to literature searches, the use of variousinformation tools and services, and the reconsideration of old coursework materials—all in a series of startsand stops, the kinds of unclear steps associated with weak approaches.

To design the behavioral experiment the researcher had to build an understanding of how humans recog-nize people they know. In addition to ‘‘looking for ways to validate that the algorithm successfully works’’they had to understand ‘‘what features are important for (face) recognition.’’ The problem solving processwas complex in that she had to use a variety of search strategies and had no ‘‘tried and true’’ way to findthe information needed. She described her searching as encumbered by a lack of ‘‘fit’’ among the questionshe was asking, the search terms she was using, and the literature returned.

Based on some local research concerning cranial-facial measurements and schizophrenia, she used Arrow-smith to search for possible connections between, for example, anthropometry and face recognition. However,she noted ‘‘I was going from the viewpoint of cranio-facial, and so how do forensic scientists recreate a facebased on a skull that they have; how do they know roughly what the features should be like? And that searchwas not fruitful . . . doing it that way.’’ [C1A1, February 22, 2005] She moved then to ‘‘just kind of lookingthrough the literature to try to figure out what were the appropriate search terms, and nothing really fit.’’To get beyond the purely medical literature, she also ran some searches in PSYCHINFO. A suggestion froma colleague put her on a new path related to the facial features. She then remembered the concept of ‘‘internaland external facial features,’’ from a graduate school course on computational facial recognition and began‘‘digging more into that literature.’’ She found the work of a researcher who had done quite a lot of workon how people use facial features to recognize others, read some of her papers and parts of a book. Basedon this work she determined, ‘‘. . . to really do this right, if we’re (going) to have someone recognize an image,(we) really have to look at people who are familiar versus unfamiliar, because it’s possible that the shape of theskull might be enough for me to say, I know who that is.’’ [C1A1, February 22, 2005].

This case presents an unusual example of WIW processes. In most other cases, the activities associated withmeasurement, instrumentation, and protocol problems tended to be strong. Similar to those more typicalcases, the de-facing research problem was well defined, but other weak elements dominated the informationwork approach over all. The process of carrying out the information activities was rarely linear or continuous.There was considerable seeking and searching to find a fit between information sources and the problem, and

Page 11: Weak information work in scientific discovery

Weak Strong

Unsystematic stepsLow domain knowledge Seek and search Structured problem space

Fig. 4. Simon’s conditions on a weak/strong continuum for Case 2.

818 C.L. Palmer et al. / Information Processing and Management 43 (2007) 808–820

there was low domain knowledge in both the behavioral approaches and facial recognition area. Fig. 4 is anestimation of the position of Simon’s conditions on a weak/strong continuum for this case.

5. Discussion

As stated previously, information work surrounds, connects, and fuels the data collection and analysisactivities emphasized in Simon’s conceptualization of the discovery process. Moreover, information workactivities are subject to the same conditions Simon associated with weak and strong scientific problem solving.Thus, as conditions from the left side of Fig. 1 increase in number and degree, information work becomesweaker. In the particular cases presented in 4.4 and across our cases more generally, two of the conditionsof weak work, ill-structured problem and low domain knowledge, were prevalent indicators of WIW.

Problem structure and domain knowledge have been examined previously in relation to information needsand practices more generally. For example, MacMullin and Taylor (1984) and Vakkari (1999) equated highproblem structure with knowledge of central variables and their interrelations, suggesting that this type ofresearch situation allows for highly determined information requirements, processes, and outcomes. And,the impact of low subject or domain knowledge on searching has been widely studied (e.g., Hsieh-Yee,1993; Sihvonen & Vakkari, 2004; Wildemuth, 2004). But these and related studies have given little attentionto how the information practices associated with problem structure and domain knowledge actually impactthe research and discovery process. Fry and Talja’s (2004) application of Whitley’s theory of the social orga-nization of scholarly fields provides a perspective that is closer to the actual practice of science. They identifyinformation features and activities associated with high levels of task uncertainty in a disciplinary specializa-tion, and it is interesting that a number of these examples would constitute WIW, especially the use of materialscattered across diverse fields and high reliance on emerging personal networks to interpret information.

An additional indicator of WIW in our cases was newness, a condition not identified by Simon. Newness cantake many forms in scientific work. In the cases excerpted in 4.4 and numerous other instances documented inour data, weak approaches were the means for doing something new like starting down a new path of inquiry ordeveloping a new technique. In other cases the problem at hand was new and unfamiliar to the researcher butnot necessarily ill-structured or out of domain. New working relationships can also be a weakening influence.Developing new collaborations is a much weaker process than relying on an established team of experts.

Newness was not only consequential in the information work process; it also tended to be a key attribute ofhigh impact information. Even a small bit of new knowledge can lead to a debate in the field. In the projectswe followed, ‘‘new’’ information could be developed or disclosed. Experimental data could lead to the devel-opment of substantial new findings, but uncovering an existing but previously unknown study or researcher ina cognate area might also push a project forward or into a new phase. Moreover, there was high value in see-ing information presented in new ways. Some of the most salient instances of research advancements in ourcase studies hinged on new combinations or visualizations of existing data.

While it may not be a surprising observation that newness complicates, or weakens, the research process, it isan important area for further investigation. Newness may be a more overarching indicator than the conditionsidentified by Simon of when high levels of WIW will be involved in a project. And it is clearly of importance as acharacteristic of information itself. Finding new information and mobilizing old information in new ways areactivities that have high potential for making a significant contribution to the research process.

6. Conclusions

In this paper our aim has been to clarify and elaborate Simon’s conception of weak and strong scientificmethods in terms of information work. Our empirically based analysis shows how research processes and

Page 12: Weak information work in scientific discovery

C.L. Palmer et al. / Information Processing and Management 43 (2007) 808–820 819

practices can be assessed to identify, and possibly predict, the kinds of activities and the stages of researchwhere weak and strong processes will be centralized. In relation to the development of information systemsand services, the most productive points of intervention are likely to be at the ends of the weak/strong con-tinuum presented in Fig. 1. As shown in Fig. 2 the weakest and strongest approaches, at least in terms of infor-mation searching, were judged by our participants to be the most important for advancing research.

This finding raises interesting questions for further study as to why the most influential information workfalls at either end of the continuum. Nonetheless, the implications are clear. Information support for verystrong, routine activities would generally reduce the information work burden for scientists. Strong informa-tion searching does not require high levels of specialized scientific expertise and can therefore be more easilydelegated to information professionals. Support is also needed for what should be strong processes involved inmanaging data and information, especially in the increasingly demanding areas of standardization, archiving,and the development of digital repositories for the dissemination of results. On the other hand, weaker, mess-ier practices have higher potential for promoting innovation and new discoveries. As Simon et al. (1981) note,the ‘‘fundamentality of a piece of scientific work is almost inversely proportional to the clarity of vision withwhich it can be planned’’ (p. 5). WIW is arduous and often speculative and in many cases not the best use theexpert researcher’s time. It is true that scientific domain knowledge can always offer an interpretive advantage,but preliminary scanning and ‘‘connecting’’ of literatures such as that done when applying the Arrowsmithsearching technique is one layer of WIW that could be performed by information specialists, perhaps bestin consultation with domain scientists.

Simon’s weak/strong conceptual framework is more powerful than reflected in his narrow conceptualiza-tion of the research process. It can be extended beyond data collection and analysis to the various levels ofinformation work at all stages of research production. This has broad implications for service to science.Because of its importance in fueling innovative research and its difficulty in practice, WIW is a locus of activitythat could benefit greatly from increased support from information professionals on research teams orthrough the development of specific systems and information services. The potential for scientific discoverycan be improved by making it easier to conduct WIW at the fronts of science where weak processes are com-monplace, especially in the preparation stages of research. In addition, delegation of stronger processes toinformation specialists would allow scientists to spend more time and effort on the intellectual work of discov-ery and less on finding ways to locate and manage the information they need to carry out that work.

Acknowledgement

We thank Les Gasser for drawing our attention to the relationship between our work and Herbert Simon’sideas on scientific discovery. We also wish to acknowledge the highly constructive comments offered by theGSLIS Research Writing Group.

This research was supported by the National Science Foundation, Grant no. 0222848. Any opinions, find-ings, conclusions, or recommendations expressed in this paper are those of the authors and do not necessarilyreflect the views of the NSF.

References

Bawden, D. (1986). Information systems and the stimulation of creativity. Journal of Information Science, 12, 203–216.Becker, H. (1998). Tricks of the trade: How to think about your research while you’re doing it. Chicago: University of Chicago Press.Bernal, J. D. (1953). Science and industry in the nineteenth century. London: Routledge & Kegan Paul.Darden, L. (1997). Recent work in computational scientific discovery. In M. Shafto & P. Langley (Eds.), Proceedings of the nineteenth

annual conference of the Cognitive Science Society (pp. 161–166). Mahwah, NJ: Lawrence Erlbaum.de Jong, H., & Rip, A. (1997). The computer revolution in science: Steps towards the realization of computer-supported discovery

environments. Artificial Intelligence, 91, 225–256.Dunbar, K. (1993). Concept discovery in a scientific domain. Cognitive Science, 17, 397–434.Fry, J., & Talja, S. (2004). The cultural shaping of scholarly communication: Explaining e-journal use within and across academic fields. In

L. Schamber & C. L. Barry (Eds.). Proceedings of the American Society for Information Science and Technology annual meeting (vol. 41,pp. 20–30). Medford, NJ: Information Today.

Fujimura, J. H. (1987). Constructing ‘do-able’ problems in cancer research: Articulating alignment. Social Studies of Science, 17(2),257–293.

Page 13: Weak information work in scientific discovery

820 C.L. Palmer et al. / Information Processing and Management 43 (2007) 808–820

Gerson, E. (2002). Premature discovery is failure of intersection among social worlds. In E. B. Hook (Ed.), Prematurity and scientific

discovery (pp. 280–291). Berkeley: University of California Press.Glaser, B. G., & Strauss, A. (1967). The discovery of grounded theory: Strategies for qualitative research. New York: Aldine De Gruyter.Harwit, M. (1981). Cosmic discovery: The search, scope and heritage of astronomy. Brighton: Harvester Press.Holton, G. (1973). Thematic origins of scientific thought: Kepler to Einstein. Cambridge: Harvard University Press.Hsieh-Yee, I. (1993). Effects of search experience and subject knowledge on the search tactics of novice and experienced searchers. Journal

of the American Society of Information Science, 44(3), 161–174.Kuhn, T. (1962). The structure of scientific revolutions. Chicago: University of Chicago Press.Langley, P., Simon, H. A., Bradshaw, G. L., & Zytkow, J. M. (1987). Scientific discovery: Computational explorations of the creative

process. Cambridge: MIT Press.MacMullin, S. E., & Taylor, R. S. (1984). Problem dimensions and information traits. The Information Society, 3(1), 91–111.Martyn, J. (1974). Information needs and uses. Annual Review of Information Science and Technology, 9, 3–23.Newell, A. (1969). Heuristic programming: Ill-structured problems. In J. S. Aronofsky (Ed.), Progress in operations research: Relationship

between operations research and the computer (pp. 361–414). New York: Wiley.Palmer, C. L. (1999). Structures and strategies of interdisciplinary science. Journal of the American Society for Information Science, 50(3),

242–253.Palmer, C. L. (2001). Work at the boundaries of science: Information and the interdisciplinary research process. Dordrecht: Kluwer.Palmer, C. L., Cragin, M. H., & Hogan, T. P. (2004). Information at the intersections of discovery: Case studies in neuroscience. In L.

Schamber & C. L. Barry (Eds.). Proceedings of the American Society for Information Science and Technology annual meeting (vol. 41,pp. 448–455). Medford, NJ: Information Today.

Root-Bernstein, R. S. (1989). Discovering: Inventing and solving problems at the frontiers of scientific knowledge. Cambridge: HarvardUniversity Press.

Sihvonen, A., & Vakkari, P. (2004). Subject knowledge improves interactive query expansion assisted by a thesaurus. Journal of

Documentation, 60(6), 673–690.Simon, H. A. (1986). Understanding the processes of science: The psychology of scientific discovery. In T. Ganelius (Ed.), Progress in

science and its social conditions: Proceedings of a Nobel symposium (pp. 159–170). Oxford: Pergamon.Simon, H. A., Langley, P. W., & Bradshaw, G. L. (1981). Scientific discovery as problem-solving. Synthese, 47, 1–27.Smalheiser, N. R. (2005). The Arrowsmith Project: 2005 status report. In A. Hoffman, H. Motoda, & T. Scheffer (Eds.). Lecture notes in

artificial intelligence (vol. 3735, pp. 26–43). Berlin: Springer.Smalheiser, N. R., & Swanson, D. R. (1994). Assessing a gap in the biomedical literature: Magnesium deficiency and neurologic disease.

Neuroscience Research Communication, 15, 1–9.Smalheiser, N. R., & Swanson, D. R. (1996). Linking estrogen to Alzheimer’s disease: An informatics approach. Neurology, 47, 809–

810.Smalheiser, N. R., & Swanson, D. R. (1998). Calcium-independent phospholipase A2 and schizophrenia. Archives of General Psychiatry,

55, 752–753.Strauss, A. L., & Corbin, J. (1998). The basics of qualitative research: Techniques and procedures for developing grounded theory (second

ed.). Thousand Oaks, CA: Sage.Strauss, A., Fagerhaugh, S., Suczek, B., & Wiener, C. (1985). Social organization of medical work. Chicago: University of Chicago Press.Swanson, D. R., & Smalheiser, N. R. (1999). Implicit text linkages between Medline records: Using Arrowsmith as an aid to scientific

discovery. Library Trends, 48(1), 48–59.Vakkari, P. (1999). Task complexity, problem structure and information actions: Integrating studies on information seeking and retrieval.

Information Processing & Management, 35, 819–837.Valdes-Perez, R. E. (1999). Principles of human-computer collaboration for knowledge discovery in science. Artificial Intelligence, 107(2),

335–346.Vaughan, D. (1992). Theory elaboration: The heuristics of case analysis. In C. C. Ragin & H. S. Becker (Eds.), What is a case?: Exploring

the foundations of social inquiry (pp. 173–202). Cambridge: Cambridge University Press.Wildemuth, B. M. (2004). The effects of domain knowledge on search tactic formulation. Journal of the American Society for Information

Science and Technology, 55(3), 246–258.

Carole L. Palmer is an associate professor at the Graduate School of Library and Information Science at the University of Illinois atUrbana-Champaign. Her research explores how information systems and services can best support interdisciplinary inquiry, discovery,and collaboration in the sciences and the humanities.

Melissa Cragin is a doctoral student in the Graduate School of Library and Information Science at UIUC. Her research interests includebiomedical information work, data curation, scholarly communication and the roles of libraries in supporting scientific research. She iscurrently investigating the use of shared digital data collections in neuroscience.

Timothy P. Hogan is a doctoral candidate in the Graduate School of Library and Information Science, University of Illinois at Urbana–Champaign. His research focuses on how people living with chronic and/or acute illnesses interact with and use information and thedevelopment of effective consumer health information services and systems.