Data at Work: Supporting Sharing in Science and
Engineering(Birnholtz & Bietz, 2003)
Adam WorrallLIS 6269 Seminar in Information Science
3/30/2010
Data and data sharing
• Information science needs “a better understanding of the use of data in practice” (p. 339)
• Data fundamentally “different from documents”(p. 339)
• Data sharing important (p. 339-340)
– “Openness” of scientific process• Confirm findings, replicate results• Build on previous work
– Large data sets require distributed collaboration• Collaboratories, e-science
3/30/2010 2LIS 6269 Seminar in Information Science
Data sharing problems
• Collaborating and sharing of data should be encouraged– But it “is not easy” to do so (p. 340)
• Why?
3/30/2010 3LIS 6269 Seminar in Information Science
– Lack of willingness to share, trust others• Competition for “revenue” (p. 345)
• Restrictions imposed by commercial interests• Trust of sources• Trust of others; will they use data well?
(see also Van House, 2003)
Data sharing problems
• Reasons (continued)– Problems with finding shared data
• Negotiate access– Difficulties interpreting and using shared data
• How collected?• How analyzed?• What format?• Metadata
– Format, encoding, controlled vocabularies, etc.• Data quality (see also Stvilia et al., 2008; Wand & Wang, 1996)
• “Tacit” knowledge of data (p. 340)
3/30/2010 LIS 6269 Seminar in Information Science 4
Methodology
• Three disciplines– Earthquake engineering– HIV / AIDS research– Space physics
• Observation and interviews of all three, surveys of earthquake engineers
• Inductive, grounded approach– Claimed they made “no assumptions about the
purpose of data” (p. 340)
3/30/2010 LIS 6269 Seminar in Information Science 5
Data dimensions
• Two dimensions identified (p. 341)
– “news” vs. “confirmation”• Confirm existing or expected results• Something unexpected needing further exploration• Something not fitting expected / prevailing model
– “streams” vs. “events”• Longitudinal vs. cross-sectional• Context for data may change• Rate of data different
• Different disciplines, different data use
3/30/2010 LIS 6269 Seminar in Information Science 6
Data’s role in scientific communities
• Defines boundaries between communities– Experimental, deductive
• More possessive of data– Theoretical, inductive
• More interested in sharing data• More interested in using shared data
– Increasing blurring of boundaries in some fields• Provides gateway into communities– Access to data, knowledge about data is “valuable
resource” (p. 343)
– Those who control data and knowledge, and access to it, act as “gatekeepers of the field” (p. 343)
3/30/2010 LIS 6269 Seminar in Information Science 7
Data’s role in scientific communities
• Indicates status in community– Using one’s own data “seen as ‘better’” than using
public data (p. 344)
• “Analyzing somebody else’s data … arguably ‘counts’ for less” (p. 344)
– Higher quality data means better reputation• For researchers, research groups, and institutions
• Enables indoctrination into community– Students often work with collecting, managing data– Degree of sharing of responsibilities differs between
fields, sometimes by seniority in field
3/30/2010 LIS 6269 Seminar in Information Science 8
Categories of data uses (p. 345)
• Identified with an eye to “revenue” from use– Benefits: reputation, publications, funding, etc.
1. “A scientist’s data set is her [or his] castle”– Researcher wants to and is able to use data to solve a
particular problem or question– Will increase revenue
2. “With a little help from my friends”– Researcher wants to use data, but needs to collaborate
with others in order to do so successfully– Data can be shared privately
• Limited risk (but still some risk)– Will increase revenue
3/30/2010 LIS 6269 Seminar in Information Science 9
Categories of data uses (p. 345)
3. “One scientist’s junk is another one’s treasure”– Researcher has no interest in using the data for a
particular problem, but others do have interest– Sharing data will slightly increase revenue– May not be worth risk of losing other revenues
4. “D’oh!”– Researcher has not thought of a use, but it would be
relevant to them and help them with a problem or question
– Sharing data could be embarrassing, decrease revenue
3/30/2010 LIS 6269 Seminar in Information Science 10
Categories of data use• Researchers will be less willing to share data
unless incentives high, risks low• Data sharing follows social networks• Provide facilities for communication around
abstractions of data sets– Encourage sharing and collaboration (category 2)
• Extend researcher’s social network– Reduce risks of embarrassment (category 4)
• Preliminary abstractions allow questions / comments before they are embarrassing
– Increase incentives and benefits (categories 2 & 3)• Beyond boundaries of researcher’s community
3/30/2010 LIS 6269 Seminar in Information Science 11
Recommendations and conclusions• Efforts to support “social interaction around data
abstractions and the data themselves” should be made (p. 346)
• Metadata should be augmented through “the sharing of supplementary materials” (i.e. abstractions) (p. 346)
• Consideration of the “social and scientific roles of data” and how to support them necessary in future research (p. 346)
• Better understanding of data abstractions needed (p. 347)
3/30/2010 LIS 6269 Seminar in Information Science 12
Issues with study and article• Bias towards natural sciences– Social scientists may use, share data differently
• Only 3 disciplines studied, others may differ further
• Generally coherent, but some parts hard to follow– Indoctrination examples appeared similar, despite
what authors termed “critical” distinction (p. 344)
– Promised “three aspects of the way data are used” but only discussed two dimensions (p. 341)
• Limitations only discussed briefly
3/30/2010 LIS 6269 Seminar in Information Science 13
Questions, comments?
3/30/2010 LIS 6269 Seminar in Information Science 14