the web-enabled research commons: applications, goals, and trends thinh nguyen october 2009
TRANSCRIPT
The Web-Enabled Research Commons: Applications,
Goals, and Trends
Thinh Nguyen
October 2009
Use Case #1
NeuroCommons Project:
Science Commons project using Semantic Web to link massive amounts of data
27,266 papers
4,563 papers
41,985 papers
10,365 papers
128,437 papers
NeuronDBBAMS
Literature
Homologene
SWAN
Entrez Gene
Gene Ontology
Mammalian Phenotype
PDSPki
BrainPharm
AlzGene
Antibodies
PubChem
MESH
Reactome
Allen Brain Atlas
credit: W3C HCLS
NeuronDB
BAMS
Literature
Homologene
SWAN
Entrez Gene
Gene Ontology
Mammalian Phenotype
PDSPki
BrainPharm
AlzGene
Antibodies
PubChem
MESH
Reactome
Allen Brain Atlas
Web page Web pagelinks to
making computers understand linkages
(the WWW)
receptorCell
membrane
is located in
http://ontology.foo.org/receptor
directed, contextual links
receptorCell
membrane
is located in
“URI”(unique names for things on the web)
http://ontology.foo.org/receptorhttp://ontology.foo.org/compartmenthttp://ontology.foo.org/receptor
http://ontology.foo.org/is_located_in
receptorCell
membrane
is located in
channelCell
membrane
is located in
neuronCell
membrane
has
Cell membrane
““compartmencompartment”t”
““container”container”
““doohickey”doohickey” http://ontology.foo.org/compartment
using the web to integrate data and
databases
prefix go: <http://purl.org/obo/owl/GO#>prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix owl: <http://www.w3.org/2002/07/owl#>prefix mesh: <http://purl.org/commons/record/mesh/>
prefix sc: <http://purl.org/science/owl/sciencecommons/>prefix ro: <http://www.obofoundry.org/ro/ro.owl#>
select ?genename ?processnamewhere
{ graph <http://purl.org/commons/hcls/pubmesh> { ?paper ?p mesh:D017966 .
?article sc:identified_by_pmid ?paper. ?gene sc:describes_gene_or_gene_product_mentioned_by ?article.
} graph <http://purl.org/commons/hcls/goa>
{ ?protein rdfs:subClassOf ?res. ?res owl:onProperty ro:has_function.
?res owl:someValuesFrom ?res2. ?res2 owl:onProperty ro:realized_as.
?res2 owl:someValuesFrom ?process. graph <http://purl.org/commons/hcls/20070416/classrelations>
{{?process <http://purl.org/obo/owl/obo#part_of> go:GO_0007166} union
{?process rdfs:subClassOf go:GO_0007166 }} ?protein rdfs:subClassOf ?parent.
?parent owl:equivalentClass ?res3. ?res3 owl:hasValue ?gene.
} graph <http://purl.org/commons/hcls/gene>
{ ?gene rdfs:label ?genename } graph <http://purl.org/commons/hcls/20070416>
{ ?process rdfs:label ?processname}}
Mesh: Pyramidal Neurons
Pubmed: Journal Articles
Entrez Gene: Genes
GO: Signal Transduction
better answers through better formats:
•reformat what we already have
•reformat into a commons, not a closed system
•get the materials into the emerging research web
What data sharing protocol (legal and policy) best enables use of Web technology?
“Licensing” Archetypes
• Public Domain: No restrictions on use or distribution, no contracts, copyright waived.
• Community Licenses: standard “open access” licenses, a range of rights, some rights reserved, available to all
• Private Licenses: custom agreements, varies by institution, privately negotiated, may be offered only to some
Goals
• Interoperable: data from many sources can be combined without restriction
• Reusable: data can be repurposed into new and interesting contexts
• Administrative Burden: low transaction costs and administrative costs over time
• Legal Certainty: users can rely on legal usability of the data
• Community Norms: consistent with community expectations and usages
Interoperability
• Public Domain ****– Can be combined with other data sources with
ease
• Community Licenses *** / **– Depends on type of license: share-alike or copyleft
are unsuitable, but attribution-only licenses are less problematic
• Private Licenses * / **– Depends on restrictions, but not scalable;
permutations too large
Reusable
• Public Domain ****– No restrictions on subsequent use
• Community Licenses ***– Depends on license, but some licenses
such as NC / ND can be restrictive
• Private Licenses **– Depends on license, but typically restrictive
Administrative Burden
• Public Domain ****– No paperwork or legal review needed
• Community License ***– Little paperwork, but some legal review
needed (attribution stacking issues)
• Private Licenses *– Large amounts of paperwork, frequent
legal review needed
Legal Certainty
• Public Domain **** / ***– Clear rights; generally irrevocable; (copyright
should be addressed)
• Community Licenses ***– Generally credible, good track record with open
access and open source licenses
• Private Licenses **– Must be considered individually; few private
licenses tested by time
Community Norms
• Public Domain ***– Traditional method for scientific data sharing
(citation)
• Community Licenses ***– Relatively new, but familiar to computer scientists
and open source community (attribution)
• Private Licenses **– tendency to emphasize private / individual
interests rather than community norms
Overall Grade
• Public Domain *** – Easiest and least restrictive form of sharing
• Community Licenses **– Can be used to implement community
expectations, but can be burdensome / restrictive
• Private Licenses *– High transaction costs, burdensome,
unpredictable
Convergence
CC0
• Released by Creative Commons in 2009
• Result of a 3-year policy exploration process
• Not a license but a waiver of copyright
Why is it needed
• “Borderline” copyright
• European sui generis database rights
• Varying legal standards for copyright protection in different countries
CC0
• [deed]
CC0
• Waiver of copyright
• Waiver of sui generis database rights
• Waiver of “neighboring rights”
• Does not affect trademarks or patents
• Only affects rights of person making assertion
Use Case #2
• Coordination and Sustainability of International Mouse Informatics Resources (CASIMIR) (EU Project)
• Commentary in Letter to Nature (Sept 2009) recommends PD and use of CC0 for sharing mouse genomic data
• Recommendations endorsed by scientists, NIH representatives, Jackson Labs, and editors of top scientific journals
Use Case #3
• Personal Genome Project - personalized medicine project from George Church lab
• Adopted CC0 to release sequence and medical data collected from volunteers
Summary
• Solving some bioinformatics problems require ability to integrate massive quantities of data from diverse sources
• Public Domain sharing best fits this need
• CC0 waiver can be used to enrich public domain and provide clarity