what’s wrong with research papers - and (how) can we fix it?
DESCRIPTION
Talk given at DERI in Galway on May 2, 2012TRANSCRIPT
Whatʼs wrong with research papers -
and (how) can we fix it?
Anita de WaardDisruptive Technologies Director
Elsevier [email protected]
http://elsatglabs.com/labs/anita
The Big Problem:
2
The Big Problem:
1)" There are too many papers
2
The Big Problem:
1)" There are too many papers 2)" We have too little time to read them
2
The Big Problem:
1)" There are too many papers 2)" We have too little time to read them
2
To address this problem, we make:
3
To address this problem, we make:• databases• text mining tools • nanopublications• data publications• wiki publications• ontologies; ontology integration tools• workflow/data integration systems• executable components• ....and write emails/grants/papers/blogs about this...• ... and we end up with:
3
To address this problem, we make:• databases• text mining tools • nanopublications• data publications• wiki publications• ontologies; ontology integration tools• workflow/data integration systems• executable components• ....and write emails/grants/papers/blogs about this...• ... and we end up with:
3
1)" Even more papers!!2)" Even less time to read them!!
What problems are we solving?
4
What problems are we solving?
• Weʼre mostly improving the format of the research article.
4
What problems are we solving?
• Weʼre mostly improving the format of the research article. • This talk: aspects of the format that are being improved
(and some examples of work to improve them): A.Issues with the paper formatB.Issues pertaining to habits of writingC.Issues inherent to language D.Issues in trying to create connected content
4
What problems are we solving?
• Weʼre mostly improving the format of the research article. • This talk: aspects of the format that are being improved
(and some examples of work to improve them): A.Issues with the paper formatB.Issues pertaining to habits of writingC.Issues inherent to language D.Issues in trying to create connected content
• Do any of these address the Big Problem?
4
What problems are we solving?
• Weʼre mostly improving the format of the research article. • This talk: aspects of the format that are being improved
(and some examples of work to improve them): A.Issues with the paper formatB.Issues pertaining to habits of writingC.Issues inherent to language D.Issues in trying to create connected content
• Do any of these address the Big Problem?• What shall we do about it?
4
A. Issue: the paper format
5
A. Issue: the paper format
A1:" Paper is two-dimensional
5
A. Issue: the paper format
A1:" Paper is two-dimensional A2:" Paper is linear
5
A. Issue: the paper format
A1:" Paper is two-dimensional A2:" Paper is linear A3: Paper is not interactive
5
A. Issue: the paper format
A1:" Paper is two-dimensional A2:" Paper is linear A3: Paper is not interactive
5
6
A1: Issue: paper is two-dimensional
• Some experiments: allow representations of interactive figures (Wolfram Alpha), Utopia, Chem-3d
6
A1: Issue: paper is two-dimensional
• Some experiments: allow representations of interactive figures (Wolfram Alpha), Utopia, Chem-3d
• Lack of experimentation with formats: the internet is multi-dimensional, so why do we still need page limits?
6
A1: Issue: paper is two-dimensional
• Some experiments: allow representations of interactive figures (Wolfram Alpha), Utopia, Chem-3d
• Lack of experimentation with formats: the internet is multi-dimensional, so why do we still need page limits?
6
A1: Issue: paper is two-dimensional
A2: Issue: paper is linear
7
A2: Issue: paper is linear
• Read from front to back (although research suggests a quick skim to core parts, but linearity helps us do that)
7
A2: Issue: paper is linear
• Read from front to back (although research suggests a quick skim to core parts, but linearity helps us do that)
• References are at the end, so your reading is not interrupted
7
A2: Issue: paper is linear
• Read from front to back (although research suggests a quick skim to core parts, but linearity helps us do that)
• References are at the end, so your reading is not interrupted
• Headers are sequential - and not directly accessible
7
A2: (Old) Experiment: ABCDE
8
A2: (Old) Experiment: ABCDE• LaTeX Stylesheet:
–Annotation–Background–Contribution–Discussion–Entities (references, projects,
terms in ontologies, etc) in RDF–Core sentences create structured
abstract
8
A2: (Old) Experiment: ABCDE• LaTeX Stylesheet:
–Annotation–Background–Contribution–Discussion–Entities (references, projects,
terms in ontologies, etc) in RDF–Core sentences create structured
abstract• E.g. in proceedings: collect all core Contribution
components
8
A2: (Old) Experiment: ABCDE• LaTeX Stylesheet:
–Annotation–Background–Contribution–Discussion–Entities (references, projects,
terms in ontologies, etc) in RDF–Core sentences create structured
abstract• E.g. in proceedings: collect all core Contribution
components• I still have the stylesheets, if anyone’s interested :-)!
8
A3: Paper is not interactive
9
A3: Paper is not interactive
• Experiment:Executable papers:–Run code within a paper–Experiments: R, SPSS,
Vistrails–Rerender code within a
paper, change algorithm/see effect; run different dataset
–How do you archive software? Satyanarayanan at CMU: Olive, ‘Internet ecosystem of curated virtual machine image collections’
9
B. Issue: habits of writing
10
B. Issue: habits of writingB1: Cite a paper - not a claim
10
B. Issue: habits of writingB1: Cite a paper - not a claimB2: No precision in describing entities
10
B. Issue: habits of writingB1: Cite a paper - not a claimB2: No precision in describing entitiesB3: We write post-mortems (stories :-)!)
10
B1: Citations create facts:
11
B1: Citations create facts: - Voorhoeve, 2006: “These miRNAs neutralize p53- mediated CDK
inhibition, possibly through direct inhibition of the expression of the tumorsuppressor LATS2.”
11
B1: Citations create facts: - Voorhoeve, 2006: “These miRNAs neutralize p53- mediated CDK
inhibition, possibly through direct inhibition of the expression of the tumorsuppressor LATS2.”
- Kloosterman and Plasterk, 2006: “In a genetic screen, miR-372 and miR-373 were found to allow proliferation of primary human cells that express oncogenic RAS and active p53, possibly by inhibiting the tumor suppressor LATS2 (Voorhoeve et al., 2006).”
11
B1: Citations create facts: - Voorhoeve, 2006: “These miRNAs neutralize p53- mediated CDK
inhibition, possibly through direct inhibition of the expression of the tumorsuppressor LATS2.”
- Kloosterman and Plasterk, 2006: “In a genetic screen, miR-372 and miR-373 were found to allow proliferation of primary human cells that express oncogenic RAS and active p53, possibly by inhibiting the tumor suppressor LATS2 (Voorhoeve et al., 2006).”
- Yabuta et al., 2007: “[On the other hand,] two miRNAs, miRNA-372 and-373, function as potential novel oncogenes in testicular germ cell tumors by inhibition of LATS2 expression, which suggests that Lats2 is an important tumor suppressor (Voorhoeve et al., 2006).”
11
B1: Citations create facts: - Voorhoeve, 2006: “These miRNAs neutralize p53- mediated CDK
inhibition, possibly through direct inhibition of the expression of the tumorsuppressor LATS2.”
- Kloosterman and Plasterk, 2006: “In a genetic screen, miR-372 and miR-373 were found to allow proliferation of primary human cells that express oncogenic RAS and active p53, possibly by inhibiting the tumor suppressor LATS2 (Voorhoeve et al., 2006).”
- Yabuta et al., 2007: “[On the other hand,] two miRNAs, miRNA-372 and-373, function as potential novel oncogenes in testicular germ cell tumors by inhibition of LATS2 expression, which suggests that Lats2 is an important tumor suppressor (Voorhoeve et al., 2006).”
- Okada et al., 2011: “Two oncogenic miRNAs, miR-372 and miR-373, directly inhibit the expression of Lats2, thereby allowing tumorigenic growth in the presence of p53 (Voorhoeve et al., 2006).”
11
B1: TAC2012: Add authorʼs text to citation
12
B1: TAC2012: Add authorʼs text to citation Voorhoeve, P. M.; le Sage, C et al. (2006). A Genetic Screen Implicates miRNA-372 and miRNA-373 As Oncogenes in Testicular Germ Cell Tumors, Cell 124 (6) pp.1169 - 1181 Citing goal: “To perform genetic screens for novel functions of miRNAs,”− in order to identify miRNAs functionally associated with carcinogenesis − to identify miRNAs that when overexpressed could substitute for p53 loss and allow continued proliferation in the context of Ras activation
12
B1: TAC2012: Add authorʼs text to citation Voorhoeve, P. M.; le Sage, C et al. (2006). A Genetic Screen Implicates miRNA-372 and miRNA-373 As Oncogenes in Testicular Germ Cell Tumors, Cell 124 (6) pp.1169 - 1181 Citing goal: “To perform genetic screens for novel functions of miRNAs,”− in order to identify miRNAs functionally associated with carcinogenesis − to identify miRNAs that when overexpressed could substitute for p53 loss and allow continued proliferation in the context of Ras activationCiting method: “We subsequently created a human miRNA expression library (miR-Lib) by cloning almost all annotated human miRNAs into our vector (Rfam release 6) (Figure S3).”− Voorhoeve et al. (116) employed a novel strategy by combining an miRNA vector library and corresponding bar code array − using a retroviral expression library of miRNAs, − Using a novel retroviral miRNA expression library, Agami and co-workers performed a cell-based screen
12
B1: TAC2012: Add authorʼs text to citation Voorhoeve, P. M.; le Sage, C et al. (2006). A Genetic Screen Implicates miRNA-372 and miRNA-373 As Oncogenes in Testicular Germ Cell Tumors, Cell 124 (6) pp.1169 - 1181 Citing goal: “To perform genetic screens for novel functions of miRNAs,”− in order to identify miRNAs functionally associated with carcinogenesis − to identify miRNAs that when overexpressed could substitute for p53 loss and allow continued proliferation in the context of Ras activationCiting method: “We subsequently created a human miRNA expression library (miR-Lib) by cloning almost all annotated human miRNAs into our vector (Rfam release 6) (Figure S3).”− Voorhoeve et al. (116) employed a novel strategy by combining an miRNA vector library and corresponding bar code array − using a retroviral expression library of miRNAs, − Using a novel retroviral miRNA expression library, Agami and co-workers performed a cell-based screenCiting result: “we identified miR-372-373, each permitting proliferation and tumorigenesis of primary human cells that harbor both oncogenic RAS and active wildtype p53.”− miR-372 and miR-373 were consequently found to permit proliferation and tumorigenesis of these primary cells carrying both oncogenic RAS and wild-type p53, − Voorhoeve et al. (2006) identified miR-372 and miR-373 − miR-372 and miR-373 were found to allow proliferation of primary human cells that express oncogenic RAS and active p53, − miR-372 has been recently described as potential oncogene that collaborate with oncogenic RAS in cellular transformation
12
B2: Issue: entities in papers are not exact
• Midfrontal cortex tissue samples from neurologically unimpaired subjects (n9) and from subjects with AD (n11) were obtained from the Rapid Autopsy Program
• Immunoblot analysis and antibodies• The following antibodies were used for immunoblotting: -actin mAb (1:10,000 dilution,
Sigma-Aldrich); -tubulin mAb (1:10,000, Abcam); T46 mAb (specific to tau 404–441, 1:1000, Invitrogen); Tau-5 mAb (human tau 218–225, 1:1000, BD Biosciences) (Porzig et al., 2007); AT8 mAb (phospho-tau Ser199, Ser202, and Thr205, 1:500, Innogenetics); PHF-1 mAb (phospho-tau Ser396 and Ser404, 1:250, gift from P. Davies); 12E8 mAb (phospho-tau Ser262 and Ser356, 1:1000, gift from P. Seubert); NMDA receptors 2A, 2B and 2D goat pAbs (C terminus, 1:1000, Santa Cruz Biotechnology)…
Maryann Martone, Jan 2012: 2012 ACM SIGHIT International Health Informatics Symposium (IHI 2012)
B2: Issue: entities in papers are not exact
• Midfrontal cortex tissue samples from neurologically unimpaired subjects (n9) and from subjects with AD (n11) were obtained from the Rapid Autopsy Program
• Immunoblot analysis and antibodies• The following antibodies were used for immunoblotting: -actin mAb (1:10,000 dilution,
Sigma-Aldrich); -tubulin mAb (1:10,000, Abcam); T46 mAb (specific to tau 404–441, 1:1000, Invitrogen); Tau-5 mAb (human tau 218–225, 1:1000, BD Biosciences) (Porzig et al., 2007); AT8 mAb (phospho-tau Ser199, Ser202, and Thr205, 1:500, Innogenetics); PHF-1 mAb (phospho-tau Ser396 and Ser404, 1:250, gift from P. Davies); 12E8 mAb (phospho-tau Ser262 and Ser356, 1:1000, gift from P. Seubert); NMDA receptors 2A, 2B and 2D goat pAbs (C terminus, 1:1000, Santa Cruz Biotechnology)…
Maryann Martone, Jan 2012: 2012 ACM SIGHIT International Health Informatics Symposium (IHI 2012)
B2: Issue: entities in papers are not exact
• Midfrontal cortex tissue samples from neurologically unimpaired subjects (n9) and from subjects with AD (n11) were obtained from the Rapid Autopsy Program
• Immunoblot analysis and antibodies• The following antibodies were used for immunoblotting: -actin mAb (1:10,000 dilution,
Sigma-Aldrich); -tubulin mAb (1:10,000, Abcam); T46 mAb (specific to tau 404–441, 1:1000, Invitrogen); Tau-5 mAb (human tau 218–225, 1:1000, BD Biosciences) (Porzig et al., 2007); AT8 mAb (phospho-tau Ser199, Ser202, and Thr205, 1:500, Innogenetics); PHF-1 mAb (phospho-tau Ser396 and Ser404, 1:250, gift from P. Davies); 12E8 mAb (phospho-tau Ser262 and Ser356, 1:1000, gift from P. Seubert); NMDA receptors 2A, 2B and 2D goat pAbs (C terminus, 1:1000, Santa Cruz Biotechnology)…•95 antibodies were identified in 8 articles
•52 did not contain enough information to determine the antibody used
Maryann Martone, Jan 2012: 2012 ACM SIGHIT International Health Informatics Symposium (IHI 2012)
B3: Issue: methods are written post-mortem
14
B3: Issue: methods are written post-mortem• Yolanda Gil at ISI modeled Bourne et al. paper in Wings
14
B3: Issue: methods are written post-mortem• Yolanda Gil at ISI modeled Bourne et al. paper in Wings• Anecdotal evidence: Phil Bourne couldn’t remember most
of this, even after digging through emails!
14
B3: So why not write the data first and wrap the paper around it??
1. Research: Each item in the system has metadata (including provenance) and relations to other data items added to it.
metadata
metadata
metadata
metadata
metadata
B3: So why not write the data first and wrap the paper around it??
1. Research: Each item in the system has metadata (including provenance) and relations to other data items added to it.
metadata
metadata
metadata
metadata
metadata
2. Workflow: All data items created in the lab are added to a (lab-owned) workflow system.
B3: So why not write the data first and wrap the paper around it??
1. Research: Each item in the system has metadata (including provenance) and relations to other data items added to it.
metadata
metadata
metadata
metadata
metadata
2. Workflow: All data items created in the lab are added to a (lab-owned) workflow system.
Rats were subjected to two grueling tests(click on fig 2 to see underlying data). These results suggest that the neurological pain pro-‐
3. Authoring: A paper is written in an authoring tool which can pull data with provenance from the workflow tool in the appropriate representation into the document.
B3: So why not write the data first and wrap the paper around it??
1. Research: Each item in the system has metadata (including provenance) and relations to other data items added to it.
metadata
metadata
metadata
metadata
metadata
2. Workflow: All data items created in the lab are added to a (lab-owned) workflow system.
4. Editing and review: Once the co-authors agree, the paper is ‘exposed’ to the editors, who in turn expose it to reviewers. Reports are stored in the authoring/editing system, the paper gets updated, until it is validated.
Review
Edit
Revise
Rats were subjected to two grueling tests(click on fig 2 to see underlying data). These results suggest that the neurological pain pro-‐
3. Authoring: A paper is written in an authoring tool which can pull data with provenance from the workflow tool in the appropriate representation into the document.
B3: So why not write the data first and wrap the paper around it??
1. Research: Each item in the system has metadata (including provenance) and relations to other data items added to it.
metadata
metadata
metadata
metadata
metadata
5. Publishing and distribution: When a paper is published, a collection of validated information is exposed to the world. It remains connected to its related data item, and its heritage can be traced.
2. Workflow: All data items created in the lab are added to a (lab-owned) workflow system.
4. Editing and review: Once the co-authors agree, the paper is ‘exposed’ to the editors, who in turn expose it to reviewers. Reports are stored in the authoring/editing system, the paper gets updated, until it is validated.
Review
Edit
Revise
Rats were subjected to two grueling tests(click on fig 2 to see underlying data). These results suggest that the neurological pain pro-‐
3. Authoring: A paper is written in an authoring tool which can pull data with provenance from the workflow tool in the appropriate representation into the document.
B3: So why not write the data first and wrap the paper around it??
Some other publisher
6. User applications: distributed applications run on this ‘exposed data’ universe.
1. Research: Each item in the system has metadata (including provenance) and relations to other data items added to it.
metadata
metadata
metadata
metadata
metadata
5. Publishing and distribution: When a paper is published, a collection of validated information is exposed to the world. It remains connected to its related data item, and its heritage can be traced.
2. Workflow: All data items created in the lab are added to a (lab-owned) workflow system.
4. Editing and review: Once the co-authors agree, the paper is ‘exposed’ to the editors, who in turn expose it to reviewers. Reports are stored in the authoring/editing system, the paper gets updated, until it is validated.
Review
Edit
Revise
Rats were subjected to two grueling tests(click on fig 2 to see underlying data). These results suggest that the neurological pain pro-‐
3. Authoring: A paper is written in an authoring tool which can pull data with provenance from the workflow tool in the appropriate representation into the document.
B3: So why not write the data first and wrap the paper around it??
C. Issue: language
16
C. Issue: language
C1:" Language is coherent
16
C. Issue: language
C1:" Language is coherentC2:" Language is narrative
16
C. Issue: language
C1:" Language is coherentC2:" Language is narrativeC3:" Language is abstract
16
C. Issue: language
C1:" Language is coherentC2:" Language is narrativeC3:" Language is abstract
16
C1: Language is coherent: Adding drug-drug interactions to DIKB
17
C1: Language is coherent: Adding drug-drug interactions to DIKB
• Drug-Interaction Knowledge Base: Clinically-oriented, evidence-based knowledge base designed to support adding data to product inserts
17
C1: Language is coherent: Adding drug-drug interactions to DIKB
• Drug-Interaction Knowledge Base: Clinically-oriented, evidence-based knowledge base designed to support adding data to product inserts
• Contains quantitative and qualitative assertions about drug mechanisms and pharmacokinetic drug-drug interactions (DDI) for over 60 drugs
17
C1: Language is coherent: Adding drug-drug interactions to DIKB
• Drug-Interaction Knowledge Base: Clinically-oriented, evidence-based knowledge base designed to support adding data to product inserts
• Contains quantitative and qualitative assertions about drug mechanisms and pharmacokinetic drug-drug interactions (DDI) for over 60 drugs
• HCLS Sig: Currently working on expanding the DIKB with more content and making a “mash‐up” view of package inserts adding up‐to‐date information
View project: http://dbmi-icode-01.dbmi.pitt.edu/dikb-evidence/front-page.htmlSPARQL endpoint: http://dbmi-icode-01.dbmi.pitt.edu:2020/directory/Drugs
17
C1: Coherent language is hard to parse
18
C1: Coherent language is hard to parse• Self-reference:
18
R-CT and its metabolites, studied using the same procedures, had properties very similar to those of the corresponding S-enantiomers.
C1: Coherent language is hard to parse• Self-reference:
18
R-CT and its metabolites, studied using the same procedures, had properties very similar to those of the corresponding S-enantiomers.
C1: Coherent language is hard to parse• Self-reference:
• Reference to external data sources:
18
R-CT and its metabolites, studied using the same procedures, had properties very similar to those of the corresponding S-enantiomers.
Average relative in vivo abundances equivalent to the relative activity factors, were estimated using methods described in detail previously (Crespi, 1995; Venkatakrishnan et al., 1998 a,c, 1999, 2000, 2001; von Moltke et al., 1999 a,b; Störmer et al., 2000).
C1: Coherent language is hard to parse• Self-reference:
• Reference to external data sources:
18
R-CT and its metabolites, studied using the same procedures, had properties very similar to those of the corresponding S-enantiomers.
Average relative in vivo abundances equivalent to the relative activity factors, were estimated using methods described in detail previously (Crespi, 1995; Venkatakrishnan et al., 1998 a,c, 1999, 2000, 2001; von Moltke et al., 1999 a,b; Störmer et al., 2000).
C1: Coherent language is hard to parse• Self-reference:
• Reference to external data sources:
18
R-CT and its metabolites, studied using the same procedures, had properties very similar to those of the corresponding S-enantiomers.
Average relative in vivo abundances equivalent to the relative activity factors, were estimated using methods described in detail previously (Crespi, 1995; Venkatakrishnan et al., 1998 a,c, 1999, 2000, 2001; von Moltke et al., 1999 a,b; Störmer et al., 2000).
C1: Coherent language is hard to parse• Self-reference:
• Reference to external data sources:
• Ways of describing meant for human eyes
18
R-CT and its metabolites, studied using the same procedures, had properties very similar to those of the corresponding S-enantiomers.
Average relative in vivo abundances equivalent to the relative activity factors, were estimated using methods described in detail previously (Crespi, 1995; Venkatakrishnan et al., 1998 a,c, 1999, 2000, 2001; von Moltke et al., 1999 a,b; Störmer et al., 2000).
Based on established index reactions, S-CT and S-DCT were negligible inhibitors (IC50> 100 µM) of CYP1A2, -2C9, -2C19, -2E1, and -3A, and weakly inhibited CYP2D6 (IC50 = 70–80 µM)
C1: Coherent language is hard to parse• Self-reference:
• Reference to external data sources:
• Ways of describing meant for human eyes
• Many statements wrapped into one:
18
R-CT and its metabolites, studied using the same procedures, had properties very similar to those of the corresponding S-enantiomers.
Average relative in vivo abundances equivalent to the relative activity factors, were estimated using methods described in detail previously (Crespi, 1995; Venkatakrishnan et al., 1998 a,c, 1999, 2000, 2001; von Moltke et al., 1999 a,b; Störmer et al., 2000).
Based on established index reactions, S-CT and S-DCT were negligible inhibitors (IC50> 100 µM) of CYP1A2, -2C9, -2C19, -2E1, and -3A, and weakly inhibited CYP2D6 (IC50 = 70–80 µM)
S-CT was transformed to S-DCT by CYP2C19 (Km = 69 µM), CYP2D6 (Km = 29 µM), and CYP3A4 (Km = 588 µM).
C2: Issue: Language is narrative
19
C2: Issue: Language is narrative• ‘The truth can only be told in stories’
19
C2: Issue: Language is narrative• ‘The truth can only be told in stories’• Complex knowledge such as scientific theories,
findings, conclusions have a narrative/rhetorical structure
19
C2: Issue: Language is narrative• ‘The truth can only be told in stories’• Complex knowledge such as scientific theories,
findings, conclusions have a narrative/rhetorical structure
• Typical pattern: claim/hypothesis, discussion of experimental findings, recap of claim, rebuttals, recap of claim
19
C2: Issue: Language is narrative• ‘The truth can only be told in stories’• Complex knowledge such as scientific theories,
findings, conclusions have a narrative/rhetorical structure
• Typical pattern: claim/hypothesis, discussion of experimental findings, recap of claim, rebuttals, recap of claim
• Roughly the same claim appears 4 or 5 times in a paper
19
20
C2: Experiment:ʻClaimed Knowledge Updatesʼ
C3: Issue: Language is abstract
21
C3: Issue: Language is abstract“These results are consistent with those obtained by RPA and demonstrate that AhR ligands suppress IL-6 mRNA levels by approximately 40–60%.”“Data presented in Figure 5A extend previous studies performed with monocytes by demonstrating that LPS induces NF-κB-DNA binding in bone marrow stromal cells.”“An added incentive for these studies was provided by the observation that the IL-6 gene promoter contains an NF-κB binding site which plays a major role in regulating LPS-induced IL-6 transcription [55-57].”• Purple = deictic/anaphoric markers, pointing to current text• Blue = metalanguage/epistemic evaluation• Green = experimental method• Red = conceptual claim• Orange = claim referred to in other work
21
C3: Formal Language:Biological Exchange Language
In a screen for miRNAs that cooperate with oncogenes in cellular transformation, we identified miR-372 and miR-373, each permitting proliferation and tumorigenesis of primary human cells that harbor both oncogenic RAS and active wild-type p53. Increased abundance of miR-372 increases cell proliferationr(MIR:miR-372) -| bp(GO:”Cell Proliferation”))Increased abundance of miR-372 increases tumorgenesisr(MIR:miR-372) -| bp(GO:Tumorgenesis))
We provide evidence that these miRNAs are potential novel oncogenes participating in the development of human testicular germ cell tumors by numbing the p53 pathway, thus allowing tumorigenic growth in the presence of wild-type p53. Increased abundance of miR-372 decreases activity of TP53r(MIR:miR-372) -| tscript(p(HUGO:Trp53))Context: cancerActivity of TP53 decreases cell growthSET Disease = “Cancer”tscript(p(HUGO:Trp53)) -| bp(GO:”Cell Growth”
22
C3: Experiment: add epistemic evaluation/knowledge attribution to BEL
C3: Experiment: add epistemic evaluation/knowledge attribution to BEL
For a Proposition P, an epistemically marked clause E is an Evaluation of P, EV, B, S(P), with:- V = Value:
3 = Assumed true, 2 = Probable, 1 = Possible, 0 = Unknown, (- 1= possibly untrue, - 2 = probably untrue, -3 = assumed untrue)
- B = Basis:ReasoningData
- S = Source:A = speaker is author A, explicitIA = speaker author, A, implicitN = other author N, explicitNN = other author NN, implicit
D. Collections of papers
24
D. Collections of papers D1:" Canʼt search papers easily
24
D. Collections of papers D1:" Canʼt search papers easilyD2:" Canʼt connect papers well
24
D. Collections of papers D1:" Canʼt search papers easilyD2:" Canʼt connect papers wellD3:" Canʼt combine knowledge from different papers
24
D1: Searching collections of papers
25
D1: Searching collections of papers
• It is relatively easy to find a paper you are looking for: Google Scholar, Google,..., Scopus... (in that order?)
25
D1: Searching collections of papers
• It is relatively easy to find a paper you are looking for: Google Scholar, Google,..., Scopus... (in that order?)
• But it is very hard to find if something was done about a certain topic (e.g. ‘citances’)
25
D1: Searching collections of papers
• It is relatively easy to find a paper you are looking for: Google Scholar, Google,..., Scopus... (in that order?)
• But it is very hard to find if something was done about a certain topic (e.g. ‘citances’)
• And it’s impossible to know if nothing was done on a topic
25
D1: Searching collections of papers
• It is relatively easy to find a paper you are looking for: Google Scholar, Google,..., Scopus... (in that order?)
• But it is very hard to find if something was done about a certain topic (e.g. ‘citances’)
• And it’s impossible to know if nothing was done on a topic
• Why aren’t more people working on this?
25
D1: Searching collections of papers
• It is relatively easy to find a paper you are looking for: Google Scholar, Google,..., Scopus... (in that order?)
• But it is very hard to find if something was done about a certain topic (e.g. ‘citances’)
• And it’s impossible to know if nothing was done on a topic
• Why aren’t more people working on this?• What happened to the semantic desktop??
25
D2: How do we connect papers?
26
D2: How do we connect papers?• Papers exist within a con-text: preceding knowledge,
succeeding knowledge, knowledge in your head or on your computer
26
D2: How do we connect papers?• Papers exist within a con-text: preceding knowledge,
succeeding knowledge, knowledge in your head or on your computer
• How can we annotate these relations, maintain connections, explore ones that others have made?
26
!"#$%&'()#*+!"#$%&''()*+,-./01'2#341546!
,$-.#+&+/.#$01!2342/&5#6&!2#!275#8&.0$&!20920-5&.2+&+/.#$&28.0-&*$!!
rdf:type
dct:title
G1
"#$%&''7841%-7.9):0'%7,;0)'<6!pav:contributedBy
"#$%&''7841%-7.9):0'/9=4(0)'<6!
swanrel:referencesAsSupportiveEvidence
G5
G6
D2: Experiment:Annotation in SWAN using DOMEO
27
!"#$%&'()#*+!"#$%&''()*+,-./01'2#341546!
,$-.#+&+/.#$01!2342/&5#6&!2#!275#8&.0$&!20920-5&.2+&+/.#$&28.0-&*$!!
rdf:type
dct:title
G1
"#$%&''7841%-7.9):0'%7,;0)'<6!pav:contributedBy
"#$%&''7841%-7.9):0'/9=4(0)'<6!
swanrel:referencesAsSupportiveEvidence
G5
G6
D2: Experiment:Annotation in SWAN using DOMEO
27
!"#$%&'()#*+!"#$%&''()*+,-./01'2#341546!
,$-.#+&+/.#$01!2342/&5#6&!2#!275#8&.0$&!20920-5&.2+&+/.#$&28.0-&*$!!
rdf:type
dct:title
G1
"#$%&''7841%-7.9):0'%7,;0)'<6!pav:contributedBy
"#$%&''7841%-7.9):0'/9=4(0)'<6!
swanrel:referencesAsSupportiveEvidence
G5
G6
D2: Experiment:Annotation in SWAN using DOMEO
27
!"#$%&'()#*+!"#$%&''()*+,-./01'2#341546!
,$-.#+&+/.#$01!2342/&5#6&!2#!275#8&.0$&!20920-5&.2+&+/.#$&28.0-&*$!!
rdf:type
dct:title
G1
"#$%&''7841%-7.9):0'%7,;0)'<6!pav:contributedBy
"#$%&''7841%-7.9):0'/9=4(0)'<6!
swanrel:referencesAsSupportiveEvidence
G5
G6
D2: Experiment:Annotation in SWAN using DOMEO
27
D3: Tracing the heritage of a statement
28
D3: Tracing the heritage of a statement
• On paper, you can’t see whether a claim or a recommendation is valid
28
D3: Tracing the heritage of a statement
• On paper, you can’t see whether a claim or a recommendation is valid
• E.g. required to check for clinical recommendations:–Is this statistically valid? –Was it shown for my patient? –Are there other things I need to know (side effects,
funding, etc)
28
29
D3: Experiment: Linking Clinical Guidelines to Evidence
A. Philips’ Electronic PaNent Records B. Elsevier-‐published Clinical Guideline
C. Elsevier (or other publisher’s) Research Report or Data
29
D3: Experiment: Linking Clinical Guidelines to Evidence
A. Philips’ Electronic PaNent Records B. Elsevier-‐published Clinical Guideline
C. Elsevier (or other publisher’s) Research Report or Data
Step 1: PaNent data + diagnosis link to Guideline recommendaNon
29
D3: Experiment: Linking Clinical Guidelines to Evidence
A. Philips’ Electronic PaNent Records B. Elsevier-‐published Clinical Guideline
C. Elsevier (or other publisher’s) Research Report or Data
Step 1: PaNent data + diagnosis link to Guideline recommendaNon
Step 2: Guideline recommendaNon links to research report/data
30
Recommenda)on in Guideline Level Evidence (in the text) Ref Recommenda)on in Reference
5.1. Laboratory tests should include a CBC count with differenNal leukocyte count and platelet count;
A-‐III No evidence in text No reference
5.2. measurement of serum levels of creaNnine and blood urea nitrogen;
A-‐III CBC counts and determinaNon of the levels of serum creaNnine and urea nitrogen are needed to plan supporNve care and to monitor for the possible occurrence of drug toxicity.
No reference
5.3. and measurement of electrolytes, hepaNc transaminase enzymes, and total bilirubin (A-‐III).
A-‐III No evidence in text No reference
Not menNoned: GET ENOUGH BLOOD, IN TWO SEPARATE BOTTLES
The total volume of blood cultured is a crucial determinant of detecNng a bloodstream infecNon [47].
[47] Our data, together with an analysis of previous studies, show that the yield of blood cultures in adults increases approximately 3% per millilitre of blood cultured.
(a ‘‘set’’ consists of 1 venipuncture or catheter access draw of 20 mL of blood divided into 1 aerobic and 1 anaerobic blood culture bogle).
Our data, together with an analysis of previous studies, show that the yield of blood cultures in adults increases approximately 3% per millilitre of blood cultured.
Not menNoned: REPEAT TESTS These tests should be done at least every 3 days during the course of intensive anNbioNc therapy.
At least weekly monitoring of serum transaminase levels is advisable for paNents with complicated courses or suspected hepatocellular injury or
D3: The reality of linking evidence:
In summary:
31
Type Problems Experiments IssuesA. Paper format: A. Paper format: A1 Two-dimensional Utopia, Wolfram CDF Standards, toolsA2 Linear ABCDE Adoption?A3 Not interactive Executable papers AdoptionB. Writing habitsB. Writing habitsB1 Reference to papers TAC: CItance summaries Need to start at authorB2 Inexact entity references NIF antibodies Need mandate!B3 Methods post-mortem Data-centric publishing Change research recording!C. Language: C. Language: C1 Coherent DIKB Hard to parse!C2 Narrative CKUs Fractal nature of paperC3 Abstract BEL Formalize knowledge levelD. Collections of papers: D. Collections of papers: D1 Can’t find Scientific search engines? Is anyone working on this?
D2 Can’t compare DOMEO/SWAN Manual, doesn’t scaleD3 Can’t combine Evidence-based guidelines Inconsistencies!
32
Have we solved the Big Problem?
1) Too many papers?• Do not make publication numbers factor in evaluation• Do not make conference attendance contingent on publication • Write fewer papers! Limit yourself to write only what is
significant and profound (and entertaining!)2)! Too little time to read?• Collectively: change expectation of work in a day• Make grant process less of a waste of time and talent• Reduce burden of administration on (senior) scientists: reinstate
departmental administrators!• Teach administration as a class: Lethbridge journal incubator• Make time to read some new (or old!) interesting work!
32
Have we solved the Big Problem?
So how do we tackle all this?• DERI-Elsevier collaboration - define research projects?• Perhaps under aegis of Force11?
• Dagstuhl Workshop in August of 2011: 35 invited attendees from different parts of science, industry, funding agencies, data centers
• Goal: map main obstacles preventing new models of science publishing and develop ways to overcome them
• Just received funding from Sloan foundation to:–Start online community–Hold next workshop –Collaboratively work on next steps
• Any thoughts? 33
Acknowledgements/collaborations: 1.Executable papers: Juliana Freire, NYU & Matthias Troyer, ETH Zurich
(Vistrails); Micah Altman, Harvard SQSS (R), Gloriana St. Claire & Mahadev Satyanarayanan, CMU (Olive) (pending IMLS grant)
2.Citance summaries: Lucy Vanderwende, Microsoft Research; Hoa Trang, NIST; Eduard Hovy, ISI/USC
3.NIF antibodies: Maryann Martone, NIF/UCSD4.Data-centric publishing: Phil Bourne, UCSD, Yolanda Gil, ISI/USC
(funded in part by Elsevier Labs)5.DIKB: Rich Boyce, U Pittsburgh, Jodi Schneider, DERI, Maria Liakata,
EBI (looking for funding opportunities!)6.CKUs: Agnes Sandor, Xerox Research Europe7.BEL/knowledge attribution: Dexter Pratt, Selventa; Henk Pander Maat,
University Utrecht (funded in part by NWO)8.DOMEO/SWAN:Paolo Ciccarese & Tim Clark, Harvard/MGH (funded in
part by Elsevier Labs)9.Evidence-based guidelines: Paul Groth, Rinke Hoekstra, Frank van
Harmelen, VU; Richard Vdovjak, Philips Research (funded by STW)10.Force11: Phil Bourne, UCSD; Eduard Hovy, ISI/USC; Tim Clark,
Harvard/MGH; Cameron Neylon, PLoS; Ivan Herman, W3C (funded in part by Sloan Foundation) 34
Anything here we can work on?
35
Type Problems Experiments IssuesA. Paper format: A. Paper format: A1 Two-dimensional Utopia, Wolfram CDF Standards, toolsA2 Linear ABCDE Adoption?A3 Not interactive Executable papers AdoptionB. Writing habitsB. Writing habitsB1 Reference to papers TAC: Citance summaries Need to start at authorB2 Inexact entity references NIF antibodies Need mandate!B3 Methods post-mortem Data-centric publishing Change research recording!C. Language: C. Language: C1 Coherent DIKB Hard to parse!C2 Narrative CKUs Fractal nature of paperC3 Abstract BEL Formalize knowledge levelD. Collections of papers: D. Collections of papers: D1 Can’t find Scientific search engines? Is anyone working on this?D2 Can’t compare DOMEO/SWAN Manual, doesn’t scaleD3 Can’t combine Evidence-based guidelines Inconsistencies!Writing less and reading moreWriting less and reading more Force11, perhaps? Social/political/personal!
36
[[1] Bleecker, J. ‘A Manifesto for Networked Objects — Cohabiting with Pigeons, Arphids and Aibos in the Internet of Things http://nearfuturelaboratory.com/2006/02/26/a-manifesto-for-networked-objects/ 2] Bechhofer, S., De Roure, D., Gamble, M., Goble, C. and Buchan, I. (2010) Research Objects: Towards Exchange and Reuse of Digital Knowledge. In: The Future of the Web for Collaborative Science (FWCS 2010), April 2010, Raleigh, NC, USA. http://precedings.nature.com/documents/4626/version/1[3] Neylon, C. ‘Network Enabled Research: Maximise scale and connectivity, minimise friction’, http://cameronneylon.net/blog/network-enabled-research/ ‘
What about writing completely differently?
36
[[1] Bleecker, J. ‘A Manifesto for Networked Objects — Cohabiting with Pigeons, Arphids and Aibos in the Internet of Things http://nearfuturelaboratory.com/2006/02/26/a-manifesto-for-networked-objects/ 2] Bechhofer, S., De Roure, D., Gamble, M., Goble, C. and Buchan, I. (2010) Research Objects: Towards Exchange and Reuse of Digital Knowledge. In: The Future of the Web for Collaborative Science (FWCS 2010), April 2010, Raleigh, NC, USA. http://precedings.nature.com/documents/4626/version/1[3] Neylon, C. ‘Network Enabled Research: Maximise scale and connectivity, minimise friction’, http://cameronneylon.net/blog/network-enabled-research/ ‘
What about writing completely differently?
36
[[1] Bleecker, J. ‘A Manifesto for Networked Objects — Cohabiting with Pigeons, Arphids and Aibos in the Internet of Things http://nearfuturelaboratory.com/2006/02/26/a-manifesto-for-networked-objects/ 2] Bechhofer, S., De Roure, D., Gamble, M., Goble, C. and Buchan, I. (2010) Research Objects: Towards Exchange and Reuse of Digital Knowledge. In: The Future of the Web for Collaborative Science (FWCS 2010), April 2010, Raleigh, NC, USA. http://precedings.nature.com/documents/4626/version/1[3] Neylon, C. ‘Network Enabled Research: Maximise scale and connectivity, minimise friction’, http://cameronneylon.net/blog/network-enabled-research/ ‘
Internet of things: (Bleecker, [1])Interact with ‘objects that blog’ or ‘Blogjects’, that:track where they are and where they’ve been;have histories of their encounters and experienceshave agency - an assertive voice on the social web [2]
What about writing completely differently?
36
[[1] Bleecker, J. ‘A Manifesto for Networked Objects — Cohabiting with Pigeons, Arphids and Aibos in the Internet of Things http://nearfuturelaboratory.com/2006/02/26/a-manifesto-for-networked-objects/ 2] Bechhofer, S., De Roure, D., Gamble, M., Goble, C. and Buchan, I. (2010) Research Objects: Towards Exchange and Reuse of Digital Knowledge. In: The Future of the Web for Collaborative Science (FWCS 2010), April 2010, Raleigh, NC, USA. http://precedings.nature.com/documents/4626/version/1[3] Neylon, C. ‘Network Enabled Research: Maximise scale and connectivity, minimise friction’, http://cameronneylon.net/blog/network-enabled-research/ ‘
Internet of things: (Bleecker, [1])Interact with ‘objects that blog’ or ‘Blogjects’, that:track where they are and where they’ve been;have histories of their encounters and experienceshave agency - an assertive voice on the social web [2]
What about writing completely differently?
36
[[1] Bleecker, J. ‘A Manifesto for Networked Objects — Cohabiting with Pigeons, Arphids and Aibos in the Internet of Things http://nearfuturelaboratory.com/2006/02/26/a-manifesto-for-networked-objects/ 2] Bechhofer, S., De Roure, D., Gamble, M., Goble, C. and Buchan, I. (2010) Research Objects: Towards Exchange and Reuse of Digital Knowledge. In: The Future of the Web for Collaborative Science (FWCS 2010), April 2010, Raleigh, NC, USA. http://precedings.nature.com/documents/4626/version/1[3] Neylon, C. ‘Network Enabled Research: Maximise scale and connectivity, minimise friction’, http://cameronneylon.net/blog/network-enabled-research/ ‘
Internet of things: (Bleecker, [1])Interact with ‘objects that blog’ or ‘Blogjects’, that:track where they are and where they’ve been;have histories of their encounters and experienceshave agency - an assertive voice on the social web [2]
Research Objects: (Bechofer et al, [2])Create semantically rich aggregations of resources, that can possess some scientific intent or support some research objective
What about writing completely differently?
36
[[1] Bleecker, J. ‘A Manifesto for Networked Objects — Cohabiting with Pigeons, Arphids and Aibos in the Internet of Things http://nearfuturelaboratory.com/2006/02/26/a-manifesto-for-networked-objects/ 2] Bechhofer, S., De Roure, D., Gamble, M., Goble, C. and Buchan, I. (2010) Research Objects: Towards Exchange and Reuse of Digital Knowledge. In: The Future of the Web for Collaborative Science (FWCS 2010), April 2010, Raleigh, NC, USA. http://precedings.nature.com/documents/4626/version/1[3] Neylon, C. ‘Network Enabled Research: Maximise scale and connectivity, minimise friction’, http://cameronneylon.net/blog/network-enabled-research/ ‘
Internet of things: (Bleecker, [1])Interact with ‘objects that blog’ or ‘Blogjects’, that:track where they are and where they’ve been;have histories of their encounters and experienceshave agency - an assertive voice on the social web [2]
Research Objects: (Bechofer et al, [2])Create semantically rich aggregations of resources, that can possess some scientific intent or support some research objective
What about writing completely differently?
36
[[1] Bleecker, J. ‘A Manifesto for Networked Objects — Cohabiting with Pigeons, Arphids and Aibos in the Internet of Things http://nearfuturelaboratory.com/2006/02/26/a-manifesto-for-networked-objects/ 2] Bechhofer, S., De Roure, D., Gamble, M., Goble, C. and Buchan, I. (2010) Research Objects: Towards Exchange and Reuse of Digital Knowledge. In: The Future of the Web for Collaborative Science (FWCS 2010), April 2010, Raleigh, NC, USA. http://precedings.nature.com/documents/4626/version/1[3] Neylon, C. ‘Network Enabled Research: Maximise scale and connectivity, minimise friction’, http://cameronneylon.net/blog/network-enabled-research/ ‘
Internet of things: (Bleecker, [1])Interact with ‘objects that blog’ or ‘Blogjects’, that:track where they are and where they’ve been;have histories of their encounters and experienceshave agency - an assertive voice on the social web [2]
Research Objects: (Bechofer et al, [2])Create semantically rich aggregations of resources, that can possess some scientific intent or support some research objective
Networked Knowledge: (Neylon, [3])If we care about taking advantage of the web and internet for research then we must tackle the building of scholarly communication networks. These networks will have two critical characteristics: scale and a lack of friction. [3]
What about writing completely differently?
Networked science in action:
37
• Galaxy Zoo: citizen science: classify galaxies in the comfort of your own home – like Hanny!
• Tim Gowers, Polymath: “This is to normal research as driving is to pushing a car”
• Mathoverflow: virtual network of mathematicians working collectively to answer big/small, clear/fuzzy questions
• Jean-Claude Bradley: ‘short-form chemistry’: tweet/blog about an experiment, Storify into a narrative
• Read Cameron Neylon’s blogon networked science!
Anything here we can work on?
38
Type Problems Experiments IssuesA. Paper format: A. Paper format: A1 Two-dimensional Utopia, Wolfram CDF Standards, toolsA2 Linear ABCDE Adoption?A3 Not interactive Executable papers AdoptionB. Writing habitsB. Writing habitsB1 Reference to papers TAC: Citance summaries Need to start at authorB2 Inexact entity references NIF antibodies Need mandate!B3 Methods post-mortem Data-centric publishing Change research recording!C. Language: C. Language: C1 Coherent DIKB Hard to parse!C2 Narrative CKUs Fractal nature of paperC3 Abstract BEL Formalize knowledge levelD. Collections of papers: D. Collections of papers: D1 Can’t find Scientific search engines? Is anyone working on this?D2 Can’t compare DOMEO/SWAN Manual, doesn’t scaleD3 Can’t combine Evidence-based guidelines Inconsistencies!Networked scienceNetworked science Mathoverflow, Bradley But is it science?Writing less and reading moreWriting less and reading more Force11, perhaps? Social/political/personal!