immersive teaching and research in data sciences via cloud computing cloud era ltd 13 june 2013...

23
Immersive Teaching and Research in Data Sciences via Cloud Computing Cloud Era Ltd 13 June 2013 [email protected] Karim Chine

Upload: francisco-maden

Post on 01-Apr-2015

216 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: Immersive Teaching and Research in Data Sciences via Cloud Computing Cloud Era Ltd 13 June 2013 karim.chine@cloudera.co.uk Karim Chine

Immersive Teaching and Research in Data Sciences via Cloud ComputingCloud Era Ltd 13 June 2013 [email protected]

Karim Chine

Page 2: Immersive Teaching and Research in Data Sciences via Cloud Computing Cloud Era Ltd 13 June 2013 karim.chine@cloudera.co.uk Karim Chine

22

Outline Introduction The Rise of Data Science What is missing on the Cloud for Research and

Education? Elastic-R: The Next Generation Data Science Platform Elastic-R: Design and Technologies Overview Rethinking Virtual Research and Teaching Demo Conclusion

Page 3: Immersive Teaching and Research in Data Sciences via Cloud Computing Cloud Era Ltd 13 June 2013 karim.chine@cloudera.co.uk Karim Chine

33

Introduction Science and the 4th Paradigm

The promises of Grid Computing for e-Research The e-Science program (UK) The NSF-funded cyber infrastructure (USA) e-Infrastructure and ICT Calls (EC)

2

22.

3

4

a

cG

a

a

Experimental Science Theoretical Science Computational Science e-Science / Data-intensive Science

Page 4: Immersive Teaching and Research in Data Sciences via Cloud Computing Cloud Era Ltd 13 June 2013 karim.chine@cloudera.co.uk Karim Chine

4

Introduction

The Dancing Bear

The townspeople gather to see the wondrous sight as the massive, lumbering beast shambles and shuffles from paw to paw. The bear is really a terrible dancer, and the wonder isn't that the bear dances well but that the bear dances at all

The tragic failure of the Grid paradigm No democratic access for all scientists, a tool for Particle Physicists and Geeky scientists No sustainable infrastructures No SLA, best effort support No flexibility, no isolation, only admins can install software, X.509 certificates, restrictive policies No interaction design, no user-centric approach

Page 5: Immersive Teaching and Research in Data Sciences via Cloud Computing Cloud Era Ltd 13 June 2013 karim.chine@cloudera.co.uk Karim Chine

55

Introduction The Data Deluge and the challenges of Big Data

* http://www.slideshare.net/AmazonWebServices/20120620-aws-summitberlinbigdataanalyticsonaws

Page 6: Immersive Teaching and Research in Data Sciences via Cloud Computing Cloud Era Ltd 13 June 2013 karim.chine@cloudera.co.uk Karim Chine

6

The Rise of Data Science

* http://en.wikipedia.org/wiki/Data_science

Data Scientists Led by NASA Star Most Sought for Century: JobsBy Aki Ito - Jun 11, 2013 12:01 AM ET (Bloomberg.com)Harvard Business Review last year called this profession “the sexiest job of the 21st century”One measure of demand: Hours billed for work in statistical analysis grew by 522 percent in the first quarter compared with the same period in 2011

Page 7: Immersive Teaching and Research in Data Sciences via Cloud Computing Cloud Era Ltd 13 June 2013 karim.chine@cloudera.co.uk Karim Chine

77

What is missing on the Cloud for Research and Education? Science-as-a-Service: (SCEs) Scientific Computing

Environments-as-a-service, Models-as-a-service… Generalized Real-time Collaboration, including

interaction with SCEs and with Data One consistent platform for homogenous access to

Science Clouds and new abstractions for stateful access to remote infrastructure/Data

Reproducibility and traceability of data analysis and computational research

Science Gateways building and publishing for everyone

Page 8: Immersive Teaching and Research in Data Sciences via Cloud Computing Cloud Era Ltd 13 June 2013 karim.chine@cloudera.co.uk Karim Chine

8

Elastic-R: The Next Generation Data Science Platform

 Arduino / Raspberry piDemocratizing Electronics

Elastic-RDemocratizing Data Science

Page 9: Immersive Teaching and Research in Data Sciences via Cloud Computing Cloud Era Ltd 13 June 2013 karim.chine@cloudera.co.uk Karim Chine

99

Elastic-R: The Next Generation Data Science Platform Elastic-R is:

A link between the data scientist/the educator/the student and the Infrastructure-as-a-Service

A consistent set of concepts, architectures, frameworks and interaction design models to wrap and complete the existing public and private clouds with the missing capabilities to radically improve the data scientist productivity

« Super Glue » between the virtual IT capabilities, the software capabilities, the mathematical/statistical capabilities and the man-machine interaction components

A Data Operating System enabling infinite combinations for analysing data and building rapidly Data Science-centric applications and services

Page 10: Immersive Teaching and Research in Data Sciences via Cloud Computing Cloud Era Ltd 13 June 2013 karim.chine@cloudera.co.uk Karim Chine

1010

Elastic-R: The Next Generation Data Science Platform Elastic-R is:

A hyper-dropbox allowing to run a large spectrum of capabilities next to the centrally stored and ubiquitously accessible data.

A platform using the cloud to introduce traceability and reproducibility to the data science arena.

A federation paradigm for scientific clouds A framework for building on-the-fly user-defined software-as-a-service

Page 11: Immersive Teaching and Research in Data Sciences via Cloud Computing Cloud Era Ltd 13 June 2013 karim.chine@cloudera.co.uk Karim Chine

11

Elastic-R: Design and Technologies Overview

Page 12: Immersive Teaching and Research in Data Sciences via Cloud Computing Cloud Era Ltd 13 June 2013 karim.chine@cloudera.co.uk Karim Chine

12

Elastic-R: Design and Technologies Overview

Robot submarine dives to the deepest part of the ocean controlled by a 7-mile cable as thin as single human hair

Page 13: Immersive Teaching and Research in Data Sciences via Cloud Computing Cloud Era Ltd 13 June 2013 karim.chine@cloudera.co.uk Karim Chine

13

Elastic-R: Design and Technologies Overview

Remote Java/R Processes Events-driven Remote

Objects/Engines R, Python, Mathematica,

Matlab, Scilab, .. Collaborative Spreadsheets Collaborative Scientific

Graphics Canvas Collaborative Dashboard with

collaborative widgets

Page 14: Immersive Teaching and Research in Data Sciences via Cloud Computing Cloud Era Ltd 13 June 2013 karim.chine@cloudera.co.uk Karim Chine

14

Elastic-R: Design and Technologies Overview

Page 15: Immersive Teaching and Research in Data Sciences via Cloud Computing Cloud Era Ltd 13 June 2013 karim.chine@cloudera.co.uk Karim Chine

15

Elastic-R: Design and Technologies Overview

Elastic-R AMI 1R 2.10

BioC 2.5

Elastic-R AMI 2R 2.9

BioC 2.3

Elastic-R AMI 3R 2.8

BioC 2.0

Elastic-R Amazon Machine Images

Elastic-R EBS 1

Data Set XXX

Elastic-R EBS 2

Data Set YYY

Elastic-R EBS 3

Data Set ZZZ

Elastic-R EBS 4

Data Set VVV

Elastic-R AMI 2

R 2.9BioC 2.3

Elastic-R EBS 4

Data Set VVV

Amazon Elastic Block Stores

Elastic-R AMI 2R 2.9

BioC 2.3

Elastic-R.org

Elastic-R EBS 4

Data Set VVV

Page 16: Immersive Teaching and Research in Data Sciences via Cloud Computing Cloud Era Ltd 13 June 2013 karim.chine@cloudera.co.uk Karim Chine

1616

Modes of access Individuals with AWS accounts

Standard AMIs (Amazon Machine Images): paid-per-use to Amazon. For Academic use and trial purposes

Paid AMI: Paid-per-use software model. For business users

Individuals without AWS accounts Trial tokens, purchased tokens, tokens granted by other users. Resources (data science engines) shared by other users Individual subscribtions

Companies/Educational & Research Institutions Dedicated Platform and AMIs on an Amazon VPC (Virtual Private

Cloud). Paid via subscribtion

Page 17: Immersive Teaching and Research in Data Sciences via Cloud Computing Cloud Era Ltd 13 June 2013 karim.chine@cloudera.co.uk Karim Chine

1717

Rethinking Virtual Research and Teaching Free researchers from the IT service dictatorship. Give researchers self-service access to the IT resources

they need Allow Researchers to share without restrictions Make real-time collaboration a free and ubuiquitous

service Allow researchers to produce and publish to the web

advanced applications/services without recourse to developers/admins

Page 18: Immersive Teaching and Research in Data Sciences via Cloud Computing Cloud Era Ltd 13 June 2013 karim.chine@cloudera.co.uk Karim Chine

1818

Rethinking Virtual Research and Teaching Make the cloud an ecosystem for Open Science where

all research artifacts can be produced, discovered and reused

Allow Researchers to « sell » the software/models/algorithms/techniques they invent seamlessly

Bridge the gap between the different computational research tools: interconnect SCEs, workflow workbenches, Documents editors…

Page 19: Immersive Teaching and Research in Data Sciences via Cloud Computing Cloud Era Ltd 13 June 2013 karim.chine@cloudera.co.uk Karim Chine

1919

Rethinking Virtual Research and Teaching Provide affordable and reliable tools for remote

education High-quality voice and video chat for a large number of users Self-service collaboration tools: Editors, White boards, IDEs, etc. Modules for Traceability/reproducibilty

Extend existing on-line courses platforms to include capabilities such as: Companion software environments in SaaS mode Collaborative problem solving tools Interactive courses Tokens for Ready-to-run e-Learning applications E-Learning environments’ visual designers

Page 20: Immersive Teaching and Research in Data Sciences via Cloud Computing Cloud Era Ltd 13 June 2013 karim.chine@cloudera.co.uk Karim Chine

2020

Demo Register to Elastic-R academic and trial portal (

www.elastic-r.org ) Create Data Science Engines using trial tokens Work with R, Python and Scientific Spreadsheets in the

browser Share Data Science Engine and Collaborate Use The Visual and Collaborative Scientific Applications

designer to create and publish to the web an interactive dashboard

Connect to the remote Data Science Engine from withing a local R session, push and pull data, execute commands and show impact on the dashboard

Page 21: Immersive Teaching and Research in Data Sciences via Cloud Computing Cloud Era Ltd 13 June 2013 karim.chine@cloudera.co.uk Karim Chine

2121

Conclusion Elastic -R unlocks the potential of the cloud for Data

Scientists and Educators With Elastic-R, the cloud becomes a cyberspace for

collaborative research and sharing and an eco-system suited for open Science, open innovation and open education

Elastic-R improves dramatically the productivity of the Data Scientists: The entire Data Science Factory chain, from resources acquisition to services and applications publishing, becomes under their direct control

Elastic-R provides Analytics-as-a-Service platform that can extend any existing portal or application

Page 22: Immersive Teaching and Research in Data Sciences via Cloud Computing Cloud Era Ltd 13 June 2013 karim.chine@cloudera.co.uk Karim Chine

2222

What to do Next Register to Elastic-R and try the HTML 5 Workbench and

the collaboration Download the R package elasticR and use it to access

the cloud from local R sessions Download the Java SDK and try to create your first

Analytical application using AWS and the most advanced tools for programming with data.

Get in touch with me to explore potential partnerships and collaborations

Page 23: Immersive Teaching and Research in Data Sciences via Cloud Computing Cloud Era Ltd 13 June 2013 karim.chine@cloudera.co.uk Karim Chine

2323

Contact details Karim Chine [email protected]