![Page 1: Data for Science: How Elsevier is using data science to empower researchers](https://reader036.vdocuments.mx/reader036/viewer/2022070602/5875baae1a28ab8b618b80ad/html5/thumbnails/1.jpg)
DATA FOR SCIENCEHOW ELSEVIER IS USING DATA SCIENCE TO EMPOWER RESEARCHERS
Paul Groth | @pgroth | pgroth.com
Disruptive Technology Director
Elsevier Labs | @elsevierlabs
European Data Forum 2016
![Page 2: Data for Science: How Elsevier is using data science to empower researchers](https://reader036.vdocuments.mx/reader036/viewer/2022070602/5875baae1a28ab8b618b80ad/html5/thumbnails/2.jpg)
![Page 3: Data for Science: How Elsevier is using data science to empower researchers](https://reader036.vdocuments.mx/reader036/viewer/2022070602/5875baae1a28ab8b618b80ad/html5/thumbnails/3.jpg)
12 million people per month
![Page 4: Data for Science: How Elsevier is using data science to empower researchers](https://reader036.vdocuments.mx/reader036/viewer/2022070602/5875baae1a28ab8b618b80ad/html5/thumbnails/4.jpg)
![Page 5: Data for Science: How Elsevier is using data science to empower researchers](https://reader036.vdocuments.mx/reader036/viewer/2022070602/5875baae1a28ab8b618b80ad/html5/thumbnails/5.jpg)
40 million reactions 75 million compounds500 million facts
![Page 6: Data for Science: How Elsevier is using data science to empower researchers](https://reader036.vdocuments.mx/reader036/viewer/2022070602/5875baae1a28ab8b618b80ad/html5/thumbnails/6.jpg)
3 EXAMPLES• Personalized: what should I read?
• Actionable: who should I collaborate with?
• Consumable: how do I make my data available?
![Page 7: Data for Science: How Elsevier is using data science to empower researchers](https://reader036.vdocuments.mx/reader036/viewer/2022070602/5875baae1a28ab8b618b80ad/html5/thumbnails/7.jpg)
RECOMMENDATIONS AT MENDELEY
• Maya Hristakeva• Data Scientist at Mendeley• @mayahhf• Spark Summit 2015• http://www.slideshare.net/SparkSummit/
sparking-science-up-with-research-recommendations-by-maya-hristakeva
![Page 8: Data for Science: How Elsevier is using data science to empower researchers](https://reader036.vdocuments.mx/reader036/viewer/2022070602/5875baae1a28ab8b618b80ad/html5/thumbnails/8.jpg)
Read &
Organize
Search &
Discover
Collaborate &
Network
Experiment&
Synthesize
MENDELEY BUILDS TOOLS TO HELP RESEARCHERS …
![Page 9: Data for Science: How Elsevier is using data science to empower researchers](https://reader036.vdocuments.mx/reader036/viewer/2022070602/5875baae1a28ab8b618b80ad/html5/thumbnails/9.jpg)
BEING THE BEST RESEARCHER YOU CAN BE!• Good researchers are on top of their game
• Large amount of research produced
• Takes time to get what you need
• Help researchers by recommending relevant research
![Page 10: Data for Science: How Elsevier is using data science to empower researchers](https://reader036.vdocuments.mx/reader036/viewer/2022070602/5875baae1a28ab8b618b80ad/html5/thumbnails/10.jpg)
![Page 11: Data for Science: How Elsevier is using data science to empower researchers](https://reader036.vdocuments.mx/reader036/viewer/2022070602/5875baae1a28ab8b618b80ad/html5/thumbnails/11.jpg)
PERSONALIZED ARTICLE RECOMMENDATIONInput:User libraries
Output:
Suggested articles to read
Algorithms:• Collaborative Filtering
– Item-based
– User-Based
– Matrix Factorization
• Content-based
![Page 12: Data for Science: How Elsevier is using data science to empower researchers](https://reader036.vdocuments.mx/reader036/viewer/2022070602/5875baae1a28ab8b618b80ad/html5/thumbnails/12.jpg)
Costly & GoodCostly & Bad
Cheap & GoodCheap & Bad
Tuned IB Mahout
Tuned UB Mahout
Tuned UB Spark
Tuned IB Spark
UB DimSumSpark MLlib
ALS Matrix Fact.Spark MLlib
Performance
+100%
+150%~$50
![Page 13: Data for Science: How Elsevier is using data science to empower researchers](https://reader036.vdocuments.mx/reader036/viewer/2022070602/5875baae1a28ab8b618b80ad/html5/thumbnails/13.jpg)
![Page 14: Data for Science: How Elsevier is using data science to empower researchers](https://reader036.vdocuments.mx/reader036/viewer/2022070602/5875baae1a28ab8b618b80ad/html5/thumbnails/14.jpg)
CALCULATING 75 TRILLION METRICS• Benchmark 4600 institutions & 220 countries updated weekly
• 40 terabytes of data
• HPCC massively parallel compute system – 40 node system
![Page 15: Data for Science: How Elsevier is using data science to empower researchers](https://reader036.vdocuments.mx/reader036/viewer/2022070602/5875baae1a28ab8b618b80ad/html5/thumbnails/15.jpg)
![Page 16: Data for Science: How Elsevier is using data science to empower researchers](https://reader036.vdocuments.mx/reader036/viewer/2022070602/5875baae1a28ab8b618b80ad/html5/thumbnails/16.jpg)
ALL DATA ISN’T CURATED
![Page 17: Data for Science: How Elsevier is using data science to empower researchers](https://reader036.vdocuments.mx/reader036/viewer/2022070602/5875baae1a28ab8b618b80ad/html5/thumbnails/17.jpg)
60 % OF TIME IS SPENT ON DATA PREPARATION
![Page 18: Data for Science: How Elsevier is using data science to empower researchers](https://reader036.vdocuments.mx/reader036/viewer/2022070602/5875baae1a28ab8b618b80ad/html5/thumbnails/18.jpg)
10 ASPECTS OF HIGHLY EFFECTIVE RESEARCH DATA
https://www.elsevier.com/connect/10-aspects-of-highly-effective-research-data
![Page 19: Data for Science: How Elsevier is using data science to empower researchers](https://reader036.vdocuments.mx/reader036/viewer/2022070602/5875baae1a28ab8b618b80ad/html5/thumbnails/19.jpg)
http://data.mendeley.com/
Each dataset receives a versioned DOI, so it can be cited
The citation for the associated article is
displayed
![Page 20: Data for Science: How Elsevier is using data science to empower researchers](https://reader036.vdocuments.mx/reader036/viewer/2022070602/5875baae1a28ab8b618b80ad/html5/thumbnails/20.jpg)
![Page 21: Data for Science: How Elsevier is using data science to empower researchers](https://reader036.vdocuments.mx/reader036/viewer/2022070602/5875baae1a28ab8b618b80ad/html5/thumbnails/21.jpg)
ACADEMIC COLLABORATIONS
![Page 22: Data for Science: How Elsevier is using data science to empower researchers](https://reader036.vdocuments.mx/reader036/viewer/2022070602/5875baae1a28ab8b618b80ad/html5/thumbnails/22.jpg)
CONCLUSION• Researchers are faced with an ever growing amount of data and content
• Data Science is key to making systems that help them
• I’ve shown three Elsevier examples. Many more!
• Antonio Gulli’s codingplayground.blogspot.nl • labs.elsevier.com
• Of course, we’re hiring
Contact: Paul Groth @pgroth