datamind: an e-learning platform for data analysis based on r. rbelgium meetup talk

38
An elearning pla,orm for Data Analysis based on R Jonathan Cornelissen, Dieter De Mesmaeker, Albert Jorissen, Mar5jn Theuwissen 24/5/2013, RBelgium meetup FEB, KU Leuven Welcome!

Upload: datamind-slides

Post on 20-Jun-2015

515 views

Category:

Technology


0 download

DESCRIPTION

We're looking for people to give us feedback on the prototype containing a first introduction to R tutorial on http://beta.datamind.org.

TRANSCRIPT

Page 1: DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetup talk

An  e-­‐learning  pla,orm  for  Data  Analysis    based  on  R  

 Jonathan  Cornelissen,  Dieter  De  Mesmaeker,  Albert  Jorissen,  Mar5jn  Theuwissen  

   

24/5/2013,  RBelgium  meetup  FEB,  KU  Leuven  

Welcome!

Page 2: DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetup talk

1.   MoIvaIon:  Why  e-­‐learning  with  and  for  R?  2.   Learner  experience    3.   Technical  overview  4.   Course  creators  experience  on  DataMind  5.   Submission  Correctness  Tests  (examples)  6.   QuesIons  and  answers?  

Page 3: DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetup talk

Why  e-­‐learning  with  and  for  R?  

Need  for  scalable  tools  to  learn    R  and  Data  Analysis…  

Page 4: DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetup talk

Because of exponentially growing R user base  More  than  2  million  R  users  growing  at  40-­‐60%  yearly  

Source:  hWp://r4stats.com/arIcles/popularity/  and  hWp://prezi.com/s1qrgfm9ko4i/the-­‐r-­‐ecosystem/  

Page 5: DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetup talk

Keyword Competition Global2Monthly2Searchesr"tutorial 0 6600introduction"to"r 0 1600online"statistics"course 0.98 1600ggplot2"tutorial 0 880statistics"course 0.85 880an"introduction"to"r 0.01 880r"book 0.06 590learning"statistics 0.38 590r"tutorials 0 590r"introduction 0.01 480statistics"courses 0.84 480statistics"introduction 0.1 480online"statistics"courses 0.99 320r"course 0.04 260r"training 0.17 260free"online"statistics"course 0.56 260statistics"training 0.62 210online"statistics"class 0.98 170statistics"class"online 0.98 140data"analysis"tutorial 0.5 110

Analysis of r-project.org Analysis of Google keywords

Compare  to:    SAS  tutorial:    4400  Eviews  tutorial:    390  Stata  tutorial:    1900  Matlab  tutorial:    22200    Hadoop  tutorial:      12100  

Source:  Analysis  based  on    h?p://cran.r-­‐project.org/report_cran.html  

Source:  Analysis  based  on    h?p://adwords.google.com/select/keywordtoolexternal  

That needs to learn the basics and the specifics of R  

•  Number  of  downloads  per  month  for:  •  IntroducIon  to  R  pdfs:  140.000  •  Summary  pdfs:  50.000  •  Some  of  the  “top”  package:  

(reliability/stability  of  numbers  below?)  

kernlab.pdf 349,780  party.pdf 167,396  

igraph.pdf 59,969  VennDiagram.pdf 30,889  

mclust.pdf 19,347  KnitR.pdf 10,697  twitteR.pdf 7,507  

randomForest.pdf 6,824  Ggplot2 5,924  raster.pdf 5,326  

Page 6: DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetup talk

Source:  hWp://r4stats.com/arIcles/popularity/    

6,275  R  packages  at  all  major  repositories,  4,315  of  which  were  at  CRAN  Across  a  broad  spectrum  of  domains:  Financial  engineering,  biostaSsScs,  data  mining,  …      

Because of the exponentially growing functionality  

Page 7: DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetup talk

Why e-learning with and for R?  

Page 8: DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetup talk

•  Great  books,  tutorials,…  on  R    •  But  coding  is  learned  by  doing    •  No  online  learning  interface  for  R  •  DocumentaIon  made  by  experts  for  experts,  

not  for  beginners  or  intermediate  users  

Learners : Students, Professionals, Researchers, Employees

Why e-learning with and for R?  

Page 9: DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetup talk

•  Great  books,  tutorials,…  on  R    •  But  coding  is  learned  by  doing    •  No  online  learning  interface  for  R  •  DocumentaIon  made  by  experts  for  experts,  

not  for  beginners  or  intermediate  users  

Teachers :

Learners :

•  Ofen  give  the  same  or  similar  feedback  to  students  in  exercise  sessions  

•  Manually  correct  assignments  •  StaIc  content  •  Hard  to  get  feedback  

Students, Professionals, Researchers, Employees

Why e-learning with and for R?  

Data Analysis Professors, Consultants, Researchers, Book authors

Page 10: DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetup talk

InteracIve  training  Learning  by  doing  

Two pillars of learning experience on DataMind  

In  a  compelling  way  GamificaSon  

Page 11: DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetup talk

Benefits for students of learning R online

1.  Everything  in  one  place:  Assignments,  sample  code,  R-­‐console,  …      

2.  Lowering  the  barrier:    Start  right-­‐away  with  R,  no  installaIon,  version  problems,  ..  since  R    runs  in  the  background  on  our  servers  

3.  Automated  correcIon  and  feedback  through  Submission  Correctness  Tests  (SCT)    

4.  More  fun  through  gamificaIon  of  the  learning  process  

Page 12: DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetup talk

LIVE  DEMO  Surf  to  

hNp://beta.datamind.org  

Page 13: DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetup talk

Exercises versus Challenges

1.  Read  challenge  

2.  Type  code  to  solve  the  challenge  

3.  Get  result  on  certain  metric  

4.  Get  ranked  on  the  leaderboard  

5.  Possibility  to  improve  your  code  

6.  Learn  from  others’  soluIons  

1.  Read  exercise  descripIon  

2.  Read  instrucIons  

3.  Type  code  to  solve  the  Exercise  

4.  Get  personalized  feedback  on  

the  correctness  of  your  soluIon  

•  For  example:  •  Forecast  R  usage  in  next  month    

Metric  =  accuracy  of  forecast  •  Find  most  efficient  way  to  calculate  

certain  parameter  of  a  model  Metric  =  Sme  to  compute  

•  …  

Page 14: DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetup talk

Technical  overview  DataMind  IT  architecture  

Page 15: DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetup talk

R  Open-­‐source  staIsIcal  language  

DataMind leverages state of the art open-source frameworks in the cloud

•  Scaling  •  Automated  •  Affordable  

Page 16: DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetup talk

•  Scalable  •  Plug  &  Play  •  Easy  

R  serve  

Ruby  on  Rails  High  producIvity  web  applicaIon  framework  

Node.js  Pla,orm  for  real-­‐Ime  scalable  network  applicaIons  

R  Open-­‐source  staIsIcal  language  

DataMind leverages state of the art open-source frameworks in the cloud

Page 17: DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetup talk

WebSockets  

AJAX  requests  

R  serve  

Ruby  on  Rails  High  producIvity  web  applicaIon  framework  

Node.js  Pla,orm  for  real-­‐Ime  scalable  network  applicaIons  

RESTful      API  

R  Open-­‐source  staIsIcal  language  

Angular.js  MVC  JavaScript  framework  for  single-­‐page  applicaIons,  maintained  by  Google  

DataMind leverages state of the art open-source frameworks in the cloud

Page 18: DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetup talk

Rserve: Communication with R

•  Package  of  Simon  Urbanek  •  Manages  sessions  and  workspaces  

•  Binary  communicaIon  •  Emulate  console  with  capture.output()  •  Detect  incomplete  statements  with  parse()  •  Catch  and  print  errors  

Page 19: DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetup talk

RAppArmor: Security

•  EvaluaIon  of  external  code  è  Huge  security  risk  •  SoluIon:  

•  Limited  access  to  OS  •  RAppArmor  

•  Package  of  Jeroen  Ooms  •  R-­‐interface  to  OS  Security  •  Limit  CPU,  Memory,  Spawned  processes  

Page 20: DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetup talk

Course creators experience on DataMind

Page 21: DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetup talk

Benefits for course creation

1.  Save  Time!  1.  Automated  correcIon  of  student  exercises  2.  Efficient  way  to  get  feedback  from  course  takers  3.  Scalable  distribuIon  of  course  content  

2.  Visibility  for  your  package  /  courses  3.  Insights  in  your  course  4.  Per  student  tracking  

1.  Number  of  aWempts  per  exercise  2.  Use  of  “hint”  and  “soluIon”  3.  Time  to  complete  per  exercise  

5.  Possibility  to  use  courses/exercises  from  other  creators  

Page 22: DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetup talk

How to create courses We want your feedback!

1.  Write  the  Assignment  

Page 23: DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetup talk

How to create courses We want your feedback!

1.  Assignment  2.  Pre-­‐exercise  code  3.  Sample  code  to  help  the  student  4.  Sample  soluIon  5.  Submission  Correctness  Test    

2.  Provide  instruc5ons  to  student  

Page 24: DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetup talk

How to create courses We want your feedback!

1.  Assignment  2.  Pre-­‐exercise  code  3.  Sample  code  to  help  the  student  4.  Sample  soluIon  5.  Submission  Correctness  Test    

3.  Provide  sample  code  to  help  student  geZng  started  

Page 25: DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetup talk

How to create courses We want your feedback!

1.  Assignment  2.  Pre-­‐exercise  code  3.  Sample  code  to  help  the  student  4.  Sample  soluIon  5.  Submission  Correctness  Test    

4.  Pre-­‐exercise  code  is  run  in  the  background  to  pre-­‐load  a  dataset,  

graphs,  etc.  

Page 26: DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetup talk

How to create courses We want your feedback!

5.  Provide  sample  solu5on  

Page 27: DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetup talk

How to create courses We want your feedback!

6.  Write  Submission  Correctness  Test  wriNen  in  R  that  checks  the  input  of  the  student  and  returns  

feedback  

Page 28: DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetup talk

Submission  Correctness  Tests  (examples)  

Page 29: DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetup talk

Submission Correctness Tests (SCT)

A  Submission  Correctness  Test  checks  the  input  from  a  student  and  returns    (i)  whether  the  student’s  input  was  correct  and  (ii)  feedback  to  student.      •  These  tests  are  wriWen  in  R  •  Should  be  easy  for  a  course  creator  

-­‐>    started  developing  an  R  package  DataMind  package  to  aid  course  creators  to  write  simple  tests*  

*hWps://github.com/jonathancornelissen/DM  

"Mistakes  are  not  errors  but  parSally  correct  soluSons  with  underlying  logic."  

Page 30: DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetup talk

1.   Assignment  to  student:    x  should  be  5    

2.   Student  types:                                      x <- 4

3.   Submission  Correctness  Test:    if( x == 5 ){

DM.result <- list(TRUE, “Well done, you genius!”) }else{

DM.result <- list(FALSE, “Please assign 5 to x”) }

4.   Output  to  student     “Please assign 5 to x”  

Simple Submission Correctness Tests (SCT)

Page 31: DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetup talk

1.   Assignment  to  student:    x  should  be  5    

2.   Student  types:                                      x <- 5

3.   Submission  Correctness  Test:    if( x == 5 ){

DM.result <- list(TRUE, “Well done, you genius!”) }else{

DM.result <- list(FALSE, “Please assign 5 to x”) }

4.   Output  to  student     “Well done, you genius!”  

Simple Submission Correctness Tests (SCT)

Page 32: DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetup talk

•  Everything  in  the  student’s  workspace  

•  DM.user.code    all  code  wri?en  by  student  

•  DM.console.output    everything  printed  to  user  console  

•  DM.errors    errors  generated  when  running  students  code  

INPUT  

Automated exercise correction with SCT Assignment  to  the  student:  Print  a  matrix  with  3  rows  containing  the  numbers  1  up  to  9    If  Student  does  this  correctly  then:  DM.console.ouput  contains          

             [,1]  [,2]  [,3]  [1,]        1        2        3  [2,]        4        5        6  [3,]        7        8        9  

Page 33: DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetup talk

•  Everything  in  the  student’s  workspace  

•  DM.user.code    all  code  wri?en  by  student  

•  DM.console.output    everything  printed  to  user  console  

•  DM.errors    errors  generated  when  running  students  code  

INPUT  

Automated exercise correction with SCT

Submission  Correctness  Test  wriNen  by  course  creator  (poten5ally  using  DM  package)  

Assignment  to  the  student:  Print  a  matrix  with  3  rows  containing  the  numbers  1  up  to  9    If  Student  does  this  correctly  then:  DM.console.ouput  contains          

             [,1]  [,2]  [,3]  [1,]        1        2        3  [2,]        4        5        6  [3,]        7        8        9  

DM.result <- DM.outputContains("matrix(1:9, byrow=TRUE, nrow=3)”)

Page 34: DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetup talk

•  Everything  in  the  student’s  workspace  

•  DM.user.code    all  code  wri?en  by  student  

•  DM.console.output    everything  printed  to  user  console  

•  DM.errors    errors  generated  when  running  students  code  

INPUT  

Automated exercise correction with SCT

Submission  Correctness  Test  wriNen  by  course  creator  (poten5ally  using  DM  package)  

       

•  Assigned  to  variable  DM.result  •  List  with  two  elements  

1.   TRUE  /  FALSE  2.   Message  to  provide  to  student  with  

feedback  

OUTPUT  

Assignment  to  the  student:  Print  a  matrix  with  3  rows  containing  the  numbers  1  up  to  9    If  Student  does  this  correctly  then:  DM.console.ouput  contains          

             [,1]  [,2]  [,3]  [1,]        1        2        3  [2,]        4        5        6  [3,]        7        8        9  

DM.result <- DM.outputContains("matrix(1:9, byrow=TRUE, nrow=3)”)

DM.  result  is  shown  to  student  

Page 35: DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetup talk

SCT enable wide variety of options

•  Has  the  student  esImated  a  certain  model  correctly?  •  Generated  a  transformed  Ime  series  that  fulfills  certain  

condiIons?  •  Generated  a  certain  type  of  graph  ?  •  Forecasted  a  metric  of  interest  within  certain  bounds?  •  …  

Page 36: DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetup talk

Albert Jorissen

Martijn Theuwissen

Dieter De Mesmaeker

Jonathan Cornelissen

Want to help us to build a community !for learning and teaching R online?

Contact us!!

[email protected]

[email protected]

[email protected]

[email protected]

Page 37: DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetup talk

Q&A  QuesIons  and  Answers  

Page 38: DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetup talk

Filled out by 286 Academics,  professionals  and  students  from  around  the  globe.

Majority  of  respondents  interested  in  free  interacIve  courses

Most  package  authors  willing  to  create    free  interacIve  tutorials

Full  data  set  of  the  survey  and  discussion  of  results  at  www.datamind.org/survey  

Survey on R and education to verify interest of community