how to persuade with data

41
How to find networked knowledge:About Stories, that Persuade With Data Anita de Waard VP Research Data Collaborations [email protected] Federal Big Data Meetup, May 20

Upload: anita-de-waard

Post on 23-Aug-2014

388 views

Category:

Science


6 download

DESCRIPTION

Talk for the BIg Data Meetup, Washington DC, May 20 2014

TRANSCRIPT

Page 1: How to persuade with data

How to find networked knowledge:About Stories, that Persuade With Data

Anita de WaardVP Research Data Collaborations

[email protected]

Federal Big Data Meetup, May 20 2014

Page 2: How to persuade with data

Discourse Comprehension 101• Letter < syllable < word < clause < sentence < discourse:

This is how linguistics is structured. But it is not how we understand text!

Page 3: How to persuade with data

• Letter < syllable < word < clause < sentence < discourse:

This is how linguistics is structured. But it is not how we understand text!

Discourse Comprehension 101

Page 4: How to persuade with data

• Letter < syllable < word < clause < sentence < discourse:

This is how linguistics is structured. But it is not how we understand text!

Discourse Comprehension 101

Page 5: How to persuade with data

• Letter < syllable < word < clause < sentence < discourse:

This is how linguistics is structured. But it is not how we understand text!

Discourse Comprehension 101

Page 6: How to persuade with data

• Letter < syllable < word < clause < sentence < discourse:

This is how linguistics is structured. But it is not how we understand text!

Discourse Comprehension 101

Page 7: How to persuade with data

• Letter < syllable < word < clause < sentence < discourse:

This is how linguistics is structured. But it is not how we understand text!

Discourse Comprehension 101

Page 8: How to persuade with data

• Letter < syllable < word < clause < sentence < discourse:

This is how linguistics is structured. But it is not how we understand text!

• Kintsch and Van Dijk, ‘93: we read a text at three levels:– surface code: literal text, exact words/syntax– text base: preserves meaning, but not exact wording– situation model: ‘microworld’ that the text is about:

constructed inferentially through interaction between the text and background knowledge

• We use knowledge about text genre to activate a schema: this allows creation of the text base and situation model

Discourse Comprehension 101

Page 9: How to persuade with data

In summary, how scientists read:• Surface code provides noun phrases and triples that offer

pointers re. topical relevance• Text base and and situation model are created through specific

metadiscourse conventions (e.g. refs at the end) that create a biological reasoning model:

• This can be expressed as a set of claims, linked to evidence, that can help represent key points in the paper

• Journal name and author’s affiliation help define schema and provide ‘willingness to be convinced’ socially/interpersonally.

We next asked whether …To do so, we transiently inhibited… Suppression of X enhanced invasion … but F was unaffected …(Figure 3A). …Collectively, these data indicated that … .

HypothesisGoal/MethodResultResultsImplication

Page 10: How to persuade with data

Examples of schema’s:

Page 11: How to persuade with data

human breast cancer

noninvasive MCF7-Ras

antisense oligonucleotides

high-grade malignancy

cell viability retroviral vector

miR-31

cloned

transiently expressed miRNA sponges

Is it pertinent? -> Possibly…Is it true? -> ?Is it new, but in agreement with what I know? -> -?

What is this paper about? A. NOUN PHRASES

Page 12: How to persuade with data

Noun Phrases: some issues• Problem 1: disambiguating terms (© GoPubMed):

– Hnrpa1 = Tis = Fli-2 = nuclear ribonucleoprotein A1 = helix destabilizing protein = single-strand binding protein = hnRNP core protein A1 = HDP-1 = topoisomerase-inhibitor suppressed.

– Cellulose 1,4-beta-cellobiosidase = exoglucanase– COLD =/ C.O.L.D. =/ cold (runny nose) =/ cold (low T)

• Problem 2: disambiguating entities (© M. Martone):– 95 antibodies were (manually!) identified in 8 articles– 52 did not contain enough information to determine the antibody

used– Some provided details in other papers– Failed to give species, clonality, vendor, or catalog number

Page 13: How to persuade with data

Noun Phrases: some progress• Despite these difficulties, noun phrase recall/precision is

quite high, e.g. I2B22011 [1], [2], others: 90%-98%• Many tools, see [3] for a list; e.g. GoPubMed:

Page 14: How to persuade with data

miR-31 PREVENT acquisition of aggressive traits

miR-31 INHIBIT noninvasive MCF7-Ras cells

miR-31 ENHANCE invasion

cell viability AFFECT inhibitor

miR-31 expression DEPRIVE metastatic cells

Is it pertinent? -> Possibly…Is it true? -> ?Is it new, but in agreement with what I know? ->?

What is this paper about? B. TRIPLES

Page 15: How to persuade with data

Triples: some issues:• Contingent on good NP & VP detection• Hard to parse text! E.g. a commercial tool gave:insulin maintaining glucose homeostasis When insulin secretion cannot be increased adequately (type I diabetes defect) to overcome insulin resistance in maintaining glucose homeostasis, hyperglycemia and glucose intolerance ensues. insulin may be involved glucose homeostasis Because PANDER is expressed by pancreatic beta-cells and in response to glucose in a similar way to those of insulin, PANDER may be involved in glucose homeostasis.

Page 16: How to persuade with data

Triples: some progress:Biological Expression Language [4]: We provide evidence that these miRNAs are potential novel oncogenes participating in the development of human testicular germ cell tumors by numbing the p53 pathway, thus allowing tumorigenic growth in the presence of wild-type p53. Increased abundance of miR-372 decreases activity of TP53r(MIR:miR-372) -| tscript(p(HUGO:Trp53))Context: cancerSET Disease = “Cancer”Activity of TP53 decreases cell growthtscript(p(HUGO:Trp53)) -| bp(GO:”Cell Growth”

Page 17: How to persuade with data

The preceding observations demonstrated that X expression deprives Y cells of attributes associated with Z. We next asked whether X also prevents the acquisition of A traits by B cells.To do so, we transiently inhibited X in C cells with either D or E. Both approaches inhibited X function by > 4.5-fold (Figure S7A).Suppression of X enhanced invasion by 20-fold and motility by 5-fold, but F was unaffected by either inhibitor (Figure 3A; Figure S7B). The E sponge reduced X function by 2.5-fold, but did not affect the activity of other known Js (Figures S8A and S8B). Collectively, these data indicated that sustained X activity is necessary to prevent the acquisition of Z traits by both K and untransformed B cells.

Is it pertinent? -> Need contentIs it true? -> Sounds likely! I know this stuff!Is it new, but in agreement with what I know? -> Need content

What is this paper about? C. METADISCOURSE

Page 18: How to persuade with data

Metadiscourse: why it matters

• Voorhoeve et al., 2006: “These miRNAs neutralize p53- mediated CDK inhibition, possibly through direct inhibition of the expression of the tumor suppressor LATS2.”

• Kloosterman and Plasterk, 2006: “In a genetic screen, miR-372 and miR-373 were found to allow proliferation of primary human cells that express oncogenic RAS and active p53, possibly by inhibiting the tumor suppressor LATS2 (Voorhoeve et al., 2006).”

• Yabuta et al., 2007: “[On the other hand,] two miRNAs, miRNA-372 and-373, function as potential novel oncogenes in testicular germ cell tumors by inhibition of LATS2 expression, which suggests that Lats2 is an important tumor suppressor (Voorhoeve et al., 2006).”

• Okada et al., 2011: “Two oncogenic miRNAs, miR-372 and miR-373, directly inhibit the expression of Lats2, thereby allowing tumorigenic growth in the presence of p53 (Voorhoeve et al., 2006).”

“[Y]ou can transform .. fiction into fact just by adding or subtracting references”, Bruno Latour [5]

Page 19: How to persuade with data

Metadiscourse: some progress• Hedging cues, speculative language, modality/negation:– Light et al [6]: finding speculative language– Wilbur et al (Hagit) [7]: focus, polarity, certainty, evidence, and

directionality– Thompson et al (Sophia) [8]: level of speculation, type/source of

the evidence and level of certainty • Sentiment detection (e.g. Kim and Hovy [9] a.m.o.): – Holder of the opinion, strength, polarity as ‘mathematical

function’ acting on main propositional content • Can make this part of the semantic web: (e.g., Ontology for

Reasoning, Certainty and Attribution, ORCA [10]): – Value (Presumed True, Probable, Possible, Unknown)– Source (Author, Named Other, Unknown)– Basis (Data, Reasoning, Unknown)

Page 20: How to persuade with data

Claim: • sustained miR-31 activity is necessary to prevent the acquisition of aggressive

traits by both tumor cells and untransformed breast epithelialEvidence: Method: • We transiently inhibited miR-31 in noninvasive MCF7-Ras cells with either

antisense oligonucleotides or miRNA sponges.Evidence: Result: • Both approaches inhibited miR-31 function by >4.5-fold (Figure S7A). • Suppression of miR-31 enhanced invasion by 20-fold and motility by 5-fold,

but cell viability was unaffected by either inhibitor (Figure 3A; Figure S7B). • The miR-31 sponge reduced miR-31 function by 2.5-fold, but did not affect

the activity of other known antimetastatic miRNAs (Figures S8A and S8B).

What is this paper about? D. CLAIMS AND EVIDENCE

Is it pertinent? -> ProbablyIs it true? -> Sounds likely! Is it new, but in agreement with what I know? -> Check/know

Page 21: How to persuade with data

Claims and Evidence: some issues:• Data2Semantics [11]: linking clinical guidelines to evidence.

Inconsistency within guideline and guidelines v. evidence: • Studies have demonstrated inconsistent results regarding the use of such

markers of inflammation as C-reactive protein (CRP), interleukins- 6 (IL-6) and -8, and procalcitonin (PCT) in neutropenic patients with cancer [55–57]. • [55]: PCT and IL-6 are more reliable markers than CRP for predicting

bacteremia in patients with febrile neutropenia• [56] In conclusion, daily measurement of PCT or IL-6 could help

identify neutropenic patients with a stable course when the fever lasts >3 d. …, it would reduce adverse events and treatment costs.

• [57] Our study supports the value of PCT as a reliable tool to predict clinical outcome in febrile neutropenia.

• Drug Interaction Knowledgebase [12]: how to identify evidence? • R-citalopram_is_not_substrate_of_cyp2c19:

• At 10uM R- or S-CT, ketoconazole reduced reaction velocity to 55 -60% of control, quinidine to 80%, and omeprazole to 80-85% of control (Fig. 6).

Page 22: How to persuade with data

Claims and Evidence: some progress• Defining ‘salient knowledge components’ in text:– Argumentative zones, CoreSC can both be found– Blake, Claim networks (2012)– Claimed Knowledge Updates (Sandor/de Waard, 2012):

Page 23: How to persuade with data

Finding claims in XIP:E.g. through scientific discourse analysis:In contrast with previous hypotheses compact plaques form before significant deposition of diffuse A beta, suggesting that different mechanisms are involved in the deposition of diffuse amyloid and the aggregation into plaques.

EntitiesRelationshipsTemporality

Connections thematic roles

Status

core information(proposition)

information extraction

rhetorical metadiscourse

discourse analysis

discourse analysisdiscourse structure

Sándor, Àgnes and de Waard, Anita, (2012).

Page 24: How to persuade with data

Formalizing claims with hedging:Biological statement with BEL/ epistemic markup

BEL representation: Epistemic evaluation

These miRNAs neutralize p53-mediated CDK inhibition, possibly through direct inhibition of the expression of the tumor-suppressor LATS2.

r(MIR:miR-372) -|(tscript(p(HUGO:Trp53)) -| kin(p(PFH:”CDK Family”)))Increased abundance of miR-372 decreases abundance of LATS2r(MIR:miR-372) -| r(HUGO:LATS2)

Value = PossibleSource = UnknownBasis = Unknown

Biological statement with Medscan/epistemic markup

MedScan Representation: Epistemic evaluation

Furthermore, we present evidence that the secretion of nesfatin-1 into the culture media was dramatically increased during the differentiation of 3T3-L1 preadipocytes into adipocytes (P < 0.001) and after treatments with TNF-alpha, IL-6, insulin, and dexamethasone (P < 0.01).

IL-6 NUCB2 (nesfatin-1)Relation: MolTransportEffect: PositiveCellType: AdipocytesCell Line: 3T3-L1

Value = ProbableSource = AuthorBasis = Data

Page 25: How to persuade with data

25

Schema’s: scientific articles are stories...The Story of Goldilocks and the Three Bears

Story Grammar Paper The AXH Domain of Ataxin-1 Mediates Neurodegeneration through Its Interaction with Gfi-1/Senseless Proteins

Once upon a time Time Setting Background The mechanisms mediating SCA1 pathogenesis are still not fully understood, but some general principles have emerged.

a little girl named Goldilocks Characters Objects of study the Drosophila Atx-1 homolog (dAtx-1) which lacks a polyQ tract,

She went for a walk in the forest. Pretty soon, she came upon a house.

Location Experimental setup

studied and compared in vivo effects and interactions to those of the human protein

She knocked and, when no one answered,

Goal Theme Researchgoal

Gain insight into how Atx-1's function contributes to SCA1 pathogenesis. How these interactions might contribute to the disease process and how they might cause toxicity in only a subset of neurons in SCA1 is not fully understood.

she walked right in. Attempt Hypothesis Atx-1 may play a role in the regulation of gene expression

At the table in the kitchen, there were three bowls of porridge.

Name Episode 1 Name dAtX-1 and hAtx-1 Induce Similar Phenotypes When Overexpressed in Files

Goldilocks was hungry. Subgoal Subgoal test the function of the AXH domain

She tasted the porridge from the first bowl.

Attempt Method overexpressed dAtx-1 in flies using the GAL4/UAS system (Brand and Perrimon, 1993) and compared its effects to those of hAtx-1.

This porridge is too hot! she exclaimed.

Outcome Results Overexpression of dAtx-1 by Rhodopsin1(Rh1)-GAL4, which drives expression in the differentiated R1-R6 photoreceptor cells (Mollereau et al., 2000 and O'Tousa et al., 1985), results in neurodegeneration in the eye, as does overexpression of hAtx-1[82Q]. Although at 2 days after eclosion, overexpression of either Atx-1 does not show obvious morphological changes in the photoreceptor cells

So, she tasted the porridge from the second bowl.

Activity Data (data not shown),

This porridge is too cold, she said Outcome Results both genotypes show many large holes and loss of cell integrity at 28 days

So, she tasted the last bowl of porridge.

Activity Data (Figures 1B-1D).

Ahhh, this porridge is just right, she said happily and

Outcome Results Overexpression of dAtx-1 using the GMR-GAL4 driver also induces eye abnormalities. The external structures of the eyes that overexpress dAtx-1 show disorganized ommatidia and loss of interommatidial bristles

she ate it all up. Data (Figure 1F),

Page 26: How to persuade with data

...that persuade (editors/authors/readers!)…Aristotle Quintilian Scientific Paper

prooimion Introduction/ exordium

The introduction of a speech, where one announces the subject and purpose of the discourse, and where one usually employs the persuasive appeal to ethos in order to establish credibility with the audience.

Introduction: positioning

prothesisStatement of Facts/narratio

The speaker here provides a narrative account of what has happened and generally explains the nature of the case.

Introduction: research question

Summary/ propostitio

The propositio provides a brief summary of what one is about to speak on, or concisely puts forth the charges or accusation. Summary of contents

pistis Proof/ confirmatio

The main body of the speech where one offers logical arguments as proof. The appeal to logos is emphasized here. Results

Refutation/ refutatio

As the name connotes, this section of a speech was devoted to answering the counterarguments of one's opponent. Related Work

epilogos peroratio Following the refutatio and concluding the classical oration, the peroratio conventionally employed appeals through pathos, and often included a summing up.

Discussion: summary, implications.

Page 27: How to persuade with data

27

... with data.

Page 28: How to persuade with data

What about the data?

Using antibodiesand squishy bits Grad Students experimentand enter details into theirlab notebook. The PI then tries to make sense of their slides,and writes a paper. End of story.

Page 29: How to persuade with data

7. Trusted (validated/checked by reviewers)

Maslow’s Hierarchy of Needs for Research Data

6. Reproducible (others can redo experiments)

9. Usable (allow tools to run on it)

4. Comprehensible (others can understand data & processes)

2. Archived (long-term & format-independent)

1. Preserved (existing in some form)

5. Discoverable (can be indexed by a system)

3. Accessible (can be accessed by others)

8. Citable (able to point & track citations)

Page 30: How to persuade with data

1. Preserve: Data Rescue Challenge• With IEDA/Lamont: award succesful data

rescue attempts• Awarded at AGU 2013• 23 submissions of data that was digitized,

preserved, made available• Winner: NIMBUS Data Rescue:

– Recovery, reprocessing and digitization of the infrared and visible observations along with their navigation and formatting.

– Over 4000 7-track tapes of global infrared satellite data were read and reprocessed.

– Nearly 200,000 visible light images were scanned, rectified and navigated.

– All the resultant data was converted to HDF-5 (NetCDF) format and freely distributed to users from NASA and NSIDC servers.

– This data was then used to calculate monthly sea ice extents for both the Arctic d the Antarctic.

• Conclusion: we (collectively) need to do more of this! How can we fund it?

7. Trusted (validated/checked by reviewers)

6. Reproducible (others can redo experiments)

9. Usable (allow tools to run on it)

4. Comprehensible (others can understand data & processes)

1. Preserved (existing in some form)

5. Discoverable (can be indexed by a system)

8. Citable (able to point & track citations)

3. Accessible (can be accessed by others)

2. Archived (long-term & format-independent)

Page 31: How to persuade with data

7. Trusted (validated/checked by reviewers)

6. Reproducible (others can redo experiments)

9. Usable (allow tools to run on it)

4. Comprehensible (others can understand data & processes)

3. Accessible (can be accessed by others)

1. Preserved (existing in some form)

5. Discoverable (can be indexed by a system)

2. Archived (long-term & format-independent)

8. Citable (able to point & track citations)

2. Archive: Olive Project• CMU CS & Library: funded by a grant

from the IMLS, Elsevier is partner• Goal: Preservation of executable content

- nowadays a large part of intellectual output, and very fragile

• Identified a series of software packages and prepared VM to preserve

• Does it work? Yes – see video (1:24)

Page 32: How to persuade with data

7. Trusted (validated/checked by reviewers)

6. Reproducible (others can redo experiments)

9. Usable (allow tools to run on it)

4. Comprehensible (others can understand data & processes)

1. Preserved (existing in some form)

5. Discoverable (can be indexed by a system)

8. Citable (able to point & track citations)

3. Access: Urban Legend

3. Accessible (can be accessed by others)

2. Archived (long-term & format-independent)

• Part 1: Metadata acquisition• Step through experimental process in series of dropdown

menus in simple web UI• Can be tailored to workflow of individual researcher• Connected to shared ontologies through lookup table,

managed centrally in lab• Connect to data input console (Igor Pro)

Page 33: How to persuade with data

7. Trusted (validated/checked by reviewers)

6. Reproducible (others can redo experiments)

9. Usable (allow tools to run on it)

4. Comprehensible (others can understand data & processes)

1. Preserved (existing in some form)

5. Discoverable (can be indexed by a system)

8. Citable (able to point & track citations)

4. Comprehend: Urban Legend

3. Accessible (can be accessed by others)

2. Archived (long-term & format-independent)

• Part 2: Data Dashboard• Access, select and manipulate data (calculate

properties, sort and plot)• Final goal: interactive figures linked to data• Plan to expand to more neuroscience labs• Plan to build for geochemistry use case

Page 34: How to persuade with data

7. Trusted (validated/checked by reviewers)

6. Reproducible (others can redo experiments)

9. Usable (allow tools to run on it)

4. Comprehensible (others can understand data & processes)

1. Preserved (existing in some form)

5. Discoverable (can be indexed by a system)

8. Citable (able to point & track citations)

5. Discover: Data Indexing proposals• Collaborated on Data Discovery Index

proposal with UCSD/Carnegie Mellon• Also worked with UIUC!• Interested in developing distributed

infrastructures on making data easier to search: what is the ‘Goldilocks lndex’ where search is scalable, yet useful?

• Looking for academic/industry partners/use cases/platforms to address the next stage

• Discoverability is key driver for metadata/data format structure!

3. Accessible (can be accessed by others)

2. Archived (long-term & format-independent)

Page 35: How to persuade with data

7. Trusted (validated/checked by reviewers)

6. Reproducible (others can redo experiments)

9. Usable (allow tools to run on it)

4. Comprehensible (others can understand data & processes)

1. Preserved (existing in some form)

5. Discoverable (can be indexed by a system)

8. Citable (able to point & track citations)

6. Reproduce: Resource Identifier InitiativeForce11 Working Group to add data identifiers to articles that is

– 1) Machine readable;– 2) Free to generate and access;– 3) Consistent across publishers and journals.

• Authors publishing in participating journals will be asked to provide RRID's for their resources; these are added to the keyword field

• RRID's will be drawn from:– The Antibody Registry– Model Organism Databases– NIF Resource Registry

• So far, Springer, Wiley, Biomednet, Elsevier journals have signed up with 11 journals, more to come

• Wide community adoption!3. Accessible (can be accessed by others)

2. Archived (long-term & format-independent)

Page 36: How to persuade with data

7. Trusted (validated/checked by reviewers)

6. Reproducible (others can redo experiments)

9. Usable (allow tools to run on it)

4. Comprehensible (others can understand data & processes)

1. Preserved (existing in some form)

5. Discoverable (can be indexed by a system)

8. Citable (able to point & track citations)

7.Trust: Moonrocks

3. Accessible (can be accessed by others)

2. Archived (long-term & format-independent)

How can we scale up data curation?Pilot project with IEDA: • A database for lunar geochemistry:

leapfrog & improve curation time• 1-year pilot, funded by Elsevier• Main conclusion: if spreadsheet

columns/headers map to RDB schema we can scale curation cost!

Page 37: How to persuade with data

7. Trusted (validated/checked by reviewers)

6. Reproducible (others can redo experiments)

9. Usable (allow tools to run on it)

4. Comprehensible (others can understand data & processes)

1. Preserved (existing in some form)

5. Discoverable (can be indexed by a system)

8. Citable (able to point & track citations)

8. Cite: Force11 Data Citation Principles• Another Force11 Working group• Defined 8 principles:

• Now seeking endorsement/working on implementation

3. Accessible (can be accessed by others)

2. Archived (long-term & format-independent)

1. Importance: Data should be considered legitimate, citable products of research. Data citations should be accorded the same importance in the scholarly record as citations of other research objects, such as publications.

2. Credit and attribution: Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data, recognizing that a single style or mechanism of attribution may not be applicable to all data.

3. Evidence: Where a specific claim rests upon data, the corresponding data citation should be provided.

4. Unique Identification: A data citation should include a persistent method for identification that is machine actionable, globally unique, and widely used by a community.

5. Access: Data citations should facilitate access to the data themselves and to such associated metadata, documentation, and other materials, as are necessary for both humans and machines to make informed use of the referenced data.

6. Persistence: Metadata describing the data, and unique identifiers should persist, even beyond the lifespan of the data they describe.

7. Versioning and granularity: Data citations should facilitate identification and access to different versions and/or subsets of data. Citations should include sufficient detail to verifiably link the citing work to the portion and version of data cited.

8. Interoperability and flexibility: Data citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities.

Page 38: How to persuade with data

7. Trusted (validated/checked by reviewers)

6. Reproducible (others can redo experiments)

9. Usable (allow tools to run on it)

4. Comprehensible (others can understand data & processes)

1. Preserved (existing in some form)

5. Discoverable (can be indexed by a system)

8. Citable (able to point & track citations)

9. Use: Executable Papers• Result of a challenge to come up with

cyberinfrastructure components to enable executable papers

• Pilot in Computer Science journals– See all code in the paper– Save it, export it– Change it and rerun on data set:

3. Accessible (can be accessed by others)

2. Archived (long-term & format-independent)

Page 39: How to persuade with data

10: Integrate data creation with data use

7. Trusted (validated/checked by reviewers)

6. Reproducible (others can redo experiments)

9. Usable (allow tools to run on it)

4. Comprehensible (others can understand data & processes)

2. Archived (long-term & format-independent)

1. Preserved (existing in some form)

5. Discoverable (can be indexed by a system)

3. Accessible (can be accessed by others)

8. Citable (able to point & track citations)

Work with domain data repositories to develop easier ways to upload data that confirms to their schema.

Follow Force11 Resource Identification initiative; reproducibility imitative. Support standard protocols.

Content enrichment using data, e.g. executable papers, virtual microsope, database linking, and others

Build tools that allow researchers to interpret and reevaluate their data directly; drive adoption of ELNs.

Software standards change: need investment in updating, e.g. Olive Project to save OSs.

Key is to have data be digital and preservable, e.g. data rescue challenge. Need funding for digitisation projects.

Collaborate on grants to develop data discovery tools; promote and use common standards, indices.

Build and encourage electronic lab notebooks to ensure data can be shared if/when needed; follow workflow.

Force11 Data Citation Principles link data to papers and v.v. Issues: need better identifiers, granularity, versioning.

10. I

nteg

rate

ups

trea

m a

nd d

owns

trea

m –

mak

e m

etad

ata

to se

rve

use.

Page 40: How to persuade with data

Thank you!Collaborations and discussions gratefully acknowledged: • CMU: Nathan Urban, Shreejoy Tripathy, Shawn Burton, Rick

Gerkin, Santosh Chandrasekaran, Matthew Geramita, Eduard Hovy

• UCSD: Phil Bourne, Brian Shoettlander, David Minor, Declan Fleming, Ilya Zaslavsky

• NIF/Force11: Maryann Martone, Anita Bandrowski• OHSU: Melissa Haendel, Nicole Vasilevsky• California Digital Library: Carly Strasser, John Kunze, Stephen

Abrams• IEDA: Kerstin Lehnert, Annika • Elsevier: Mark Harviston, Jez Alder, David Marques

Page 41: How to persuade with data

Questions?

Anita de WaardVP Research Data Collaborations

[email protected]

http://researchdata.elsevier.com/