big data
DESCRIPTION
Barend Mons over Big Data op de SURFnet Relatiedagen 2012TRANSCRIPT
![Page 1: Big Data](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5561e445d8b42aa5068b4d03/html5/thumbnails/1.jpg)
04/12/23 2
PPP
![Page 2: Big Data](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5561e445d8b42aa5068b4d03/html5/thumbnails/2.jpg)
DISC: the connected data departments of DTL research Hotels
DTL
DISC*
*) DISC = DTL Data Integration & Stewardship Centre
technology research
education & training
technologyfacilities
![Page 3: Big Data](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5561e445d8b42aa5068b4d03/html5/thumbnails/3.jpg)
What is bioinformatics?
5
• The science of storing, retrieving and analysing large amounts of biological information
• An interdisciplinary science involving biologists, biochemists, computer scientists and mathematicians
• At the heart of modern biology
![Page 4: Big Data](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5561e445d8b42aa5068b4d03/html5/thumbnails/4.jpg)
6
1 GenomesContain genes
1 GenomesContain genes
2 Genes are transcribed
2 Genes are transcribed
5 Proteins interact with each other and with small
molecules to form pathways
5 Proteins interact with each other and with small
molecules to form pathways
3 Transcripts translate to protein
sequences
3 Transcripts translate to protein
sequences
4 Proteins form three-dimensional
structures
4 Proteins form three-dimensional
structures
6 Pathways combine to build
systems
6 Pathways combine to build
systems
Bioinformatics underpins life-science research
![Page 5: Big Data](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5561e445d8b42aa5068b4d03/html5/thumbnails/5.jpg)
Life Science data: Multi-omics, multi-technology, multi organism, multi dimensional
![Page 6: Big Data](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5561e445d8b42aa5068b4d03/html5/thumbnails/6.jpg)
From molecules to medicine
8
Molecular components Integration Translation
Genomes
Nucleotides
Transcripts
Proteins
Complexes
Pathways
Small molecules
Structures
Domains
Cells
Biobanks
Tissues and organs
Humanpopulations
Therapies
Diseaseprevention
EarlyDiagnosis
Humanindividuals
![Page 7: Big Data](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5561e445d8b42aa5068b4d03/html5/thumbnails/7.jpg)
The challenge• Computer speed
and storage capacity is doubling every 18 months and this rate is steady
• DNA sequence data is doubling every 6-8 months over the last 3 years and looks to continue for this decade
11
Guy Cochrane, ENA, EMBL-EBI
![Page 8: Big Data](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5561e445d8b42aa5068b4d03/html5/thumbnails/8.jpg)
Europe has already paid for the science
12
Annual cost of generating new protein structure data in labs around the world
Annual cost of maintaining the datain a central database
![Page 9: Big Data](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5561e445d8b42aa5068b4d03/html5/thumbnails/9.jpg)
ELIXIR’s mission
13
medicine
environment
bioindustries
society
To build a sustainable European infrastructure for biological information, supporting life science research and its translation to:
![Page 10: Big Data](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5561e445d8b42aa5068b4d03/html5/thumbnails/10.jpg)
![Page 11: Big Data](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5561e445d8b42aa5068b4d03/html5/thumbnails/11.jpg)
13 ELIXIR Countries
21
![Page 12: Big Data](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5561e445d8b42aa5068b4d03/html5/thumbnails/12.jpg)
Part two >>>> eScience in LS
• The way we dicover knowledge has changed fundamentally over just a decade.
04/12/23 22
BIGNORANCE
![Page 13: Big Data](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5561e445d8b42aa5068b4d03/html5/thumbnails/13.jpg)
The general challenge: Data has far outgrown institutional handling capacity
….The amount of digital data is exploding, with a staggering 1.8 zettabytes in 2011
The Issue:The Data Deluge is everywhereBut Life Sciences is particularly challenged and complex.
More and moreWe write‘about datasets’ That are too large to publishIn narrative
![Page 14: Big Data](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5561e445d8b42aa5068b4d03/html5/thumbnails/14.jpg)
Cardinal Assertion
1 identicalassertion
‘n’ differentprovenances
Nanopublications & Cardinal Assertions
A Cardinal Assertion aggregates all ‘n’ Nanopublications making the same assertion. It therefore has 1 assertion and ‘n’ provenances, eliminating redundancy.
A Nanopublication is the smallest unit of publishable information containing: 1.Assertion
A statement of concepts in terms of one or more ‘subject -> predicate -> object’ (triple) relationships.
2.Provenancea)Attribution – Who made this assertion, when and where? b)Supporting information – Any other information which is relevant to the assertion (e.g. this assertion is only valid in humans under 18).
Nanopublication
![Page 15: Big Data](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5561e445d8b42aa5068b4d03/html5/thumbnails/15.jpg)
Under the hood……
![Page 16: Big Data](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5561e445d8b42aa5068b4d03/html5/thumbnails/16.jpg)
Managing volume & complexity
Individual Nanopublications
> 1014
55 4 2 1
Individual Cardinal Assertions
> 1011
55
44 22
11
Individual Concept Profiles
≈4x106
Combining Cardinal Assertions with Concept profiles reduces the amount of data with ≈99.999996%
![Page 17: Big Data](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5561e445d8b42aa5068b4d03/html5/thumbnails/17.jpg)
The LS concept web: 2x2x106 concepts (profiles)
![Page 18: Big Data](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5561e445d8b42aa5068b4d03/html5/thumbnails/18.jpg)
28
A dynamic Concept Web versus a static Ontology
![Page 19: Big Data](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5561e445d8b42aa5068b4d03/html5/thumbnails/19.jpg)
More mutual informationNo increase in concept overlap
Including manual curation
More mutual informationNo increase in concept overlap
Including manual curation
More concepts in commonMore concepts in common
Removal of low info pathsRemoval of low info paths
= Known reference pairs= non-co-occurrence pairs
![Page 20: Big Data](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5561e445d8b42aa5068b4d03/html5/thumbnails/20.jpg)
![Page 21: Big Data](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5561e445d8b42aa5068b4d03/html5/thumbnails/21.jpg)
eScience…. in silico reasoning and in cerebro validation
Expert Skype calls
Reading up
![Page 22: Big Data](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5561e445d8b42aa5068b4d03/html5/thumbnails/22.jpg)
Organisation of the ecosystem
CA Space (OCS & ICS)
Providers
Original Data Owners
Global Authority Nanopublishers App & Service Providers
Users
Endorse
Assist & Certify
Application development
Reasoning services
technical and process
consultancy
project delivery capacity
ONS/INSsAcademic & Commercial
Users
KnowledgeManagement
KnowledgeDiscovery
Best
Practices
![Page 23: Big Data](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5561e445d8b42aa5068b4d03/html5/thumbnails/23.jpg)
33
![Page 24: Big Data](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5561e445d8b42aa5068b4d03/html5/thumbnails/24.jpg)
Acceptance of Semantic Web Approach
Over the last decade, academic research organisations developed new methodologies and tools to address the Big Data problem.Global agreement by leading scientists on unique Nanopublication solution.100’s of millions already invested in the basis technologyApplicable as a technology across (STM) domains and industries.Pharmaceutical companies are early adopters (Innovative Medicine Initiative).
![Page 25: Big Data](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5561e445d8b42aa5068b4d03/html5/thumbnails/25.jpg)
Acknowledging…• Herman van Haagen , MsC. (LUMC)• Dr. Peter Bram ‘t Hoen (LUMC)• Dr. Marco Roos (LUMC)• Dr. Erik Schultes (LUMC)• Prof. Johan den Dunnen (LUMC)• Prof. Gertjan van Ommen (LUMC)• Dr. Erik van Mulligen (EMC)• Dr. Jan Kors (EMC)• Dr. Martijn Schuemie (EMC)• Prof. Johan van der Lei (EMC)• Dr. Rob Hooft (NBIC)• Dr. Christine Chichester (NBIC)• Dr. Leon Mei (NBIC)• Kees Burger (NBIC)• Bharat Singh (NBIC/EMC)• Dr. Marc van Driel (NBIC)• Dr. Ruben Kok (NBIC)• Prof. Marcel Reinders (NBIC)• Prof. Jaap Heringa (NBIC)• Prof. Gert Vriend (NBIC)• Dr. Morris Schwertz (BBMRI, CWA)• Dr. Andra Waagmeester (NBIC)• Dr. Kristina Hettne (LUMC)• Dr. Rene van Schaik (eScience Cenrte)• Drs. Albert Mons (PHORTOS consultants)• Mr. Drs. Arie Baak (PHORTOS consultants)
• Prof. Amos Bairoch (SIB, Switzerland, CWA) • Prof. Carole Goble (Mancheste, CWA, OPS)• Prof. Katy Borner (Indiana University CWA)• Prof. Mark Musen (NCBO, Stanford CWA,OPS)• Dr. Pascale Gaudet (UniProt, ISB, CWA• Dr. Mike Colon (VIVO, UF, CWA)• Prof. Maryann Martone (Force 11, USC, CWA)• Dr. Nigam Shah (NCBO, Stanford, CWA, OPS)• Dr. Mark Wlikinson (Canada, CWA)• Abel Packer (Brazil, Scielo, CWA, OPS)• Jan Velterop (ACKnowledge, CWA, OPS)• Albert Mons (CWA, NBIC)• Prof. Frank van Harnelen (FUA/LARKC, CWA, OPS)• Dr. Chris Evelo (Maastrciht, CWA, OPS)• Dr. Antony Willams (RSC/ChemSpider, CWA,OPS)• Dr. Richard Kidd (RSC, OPS)• Dr. Paul Groth (FUA, CWA, OPS)• Dr. Michel Dumontier (Canada, CWA, OPS)• Dr .Andrew Gibson, UA, CWA, OPS)• Dr. Bryn Williams-Jones (Pfizer, OPS)• Dr. Ian Dix (Astra Zeneca, OPS)• Dr. Niklas Blomberg (Astra Zeneca, OPS)• Dr. Mike Barnes, GSK, OPS)• Prof. Jan-erik Litton (CWA, BBMRI)
The ‘Dutch Team’
CWA- Open PHACTS