datamind: an e-learning platform for data analysis based on r. rbelgium meetup talk
DESCRIPTION
We're looking for people to give us feedback on the prototype containing a first introduction to R tutorial on http://beta.datamind.org.TRANSCRIPT
An e-‐learning pla,orm for Data Analysis based on R
Jonathan Cornelissen, Dieter De Mesmaeker, Albert Jorissen, Mar5jn Theuwissen
24/5/2013, RBelgium meetup FEB, KU Leuven
Welcome!
1. MoIvaIon: Why e-‐learning with and for R? 2. Learner experience 3. Technical overview 4. Course creators experience on DataMind 5. Submission Correctness Tests (examples) 6. QuesIons and answers?
Why e-‐learning with and for R?
Need for scalable tools to learn R and Data Analysis…
Because of exponentially growing R user base More than 2 million R users growing at 40-‐60% yearly
Source: hWp://r4stats.com/arIcles/popularity/ and hWp://prezi.com/s1qrgfm9ko4i/the-‐r-‐ecosystem/
Keyword Competition Global2Monthly2Searchesr"tutorial 0 6600introduction"to"r 0 1600online"statistics"course 0.98 1600ggplot2"tutorial 0 880statistics"course 0.85 880an"introduction"to"r 0.01 880r"book 0.06 590learning"statistics 0.38 590r"tutorials 0 590r"introduction 0.01 480statistics"courses 0.84 480statistics"introduction 0.1 480online"statistics"courses 0.99 320r"course 0.04 260r"training 0.17 260free"online"statistics"course 0.56 260statistics"training 0.62 210online"statistics"class 0.98 170statistics"class"online 0.98 140data"analysis"tutorial 0.5 110
Analysis of r-project.org Analysis of Google keywords
Compare to: SAS tutorial: 4400 Eviews tutorial: 390 Stata tutorial: 1900 Matlab tutorial: 22200 Hadoop tutorial: 12100
Source: Analysis based on h?p://cran.r-‐project.org/report_cran.html
Source: Analysis based on h?p://adwords.google.com/select/keywordtoolexternal
That needs to learn the basics and the specifics of R
• Number of downloads per month for: • IntroducIon to R pdfs: 140.000 • Summary pdfs: 50.000 • Some of the “top” package:
(reliability/stability of numbers below?)
kernlab.pdf 349,780 party.pdf 167,396
igraph.pdf 59,969 VennDiagram.pdf 30,889
mclust.pdf 19,347 KnitR.pdf 10,697 twitteR.pdf 7,507
randomForest.pdf 6,824 Ggplot2 5,924 raster.pdf 5,326
Source: hWp://r4stats.com/arIcles/popularity/
6,275 R packages at all major repositories, 4,315 of which were at CRAN Across a broad spectrum of domains: Financial engineering, biostaSsScs, data mining, …
Because of the exponentially growing functionality
Why e-learning with and for R?
• Great books, tutorials,… on R • But coding is learned by doing • No online learning interface for R • DocumentaIon made by experts for experts,
not for beginners or intermediate users
Learners : Students, Professionals, Researchers, Employees
Why e-learning with and for R?
• Great books, tutorials,… on R • But coding is learned by doing • No online learning interface for R • DocumentaIon made by experts for experts,
not for beginners or intermediate users
Teachers :
Learners :
• Ofen give the same or similar feedback to students in exercise sessions
• Manually correct assignments • StaIc content • Hard to get feedback
Students, Professionals, Researchers, Employees
Why e-learning with and for R?
Data Analysis Professors, Consultants, Researchers, Book authors
InteracIve training Learning by doing
Two pillars of learning experience on DataMind
In a compelling way GamificaSon
Benefits for students of learning R online
1. Everything in one place: Assignments, sample code, R-‐console, …
2. Lowering the barrier: Start right-‐away with R, no installaIon, version problems, .. since R runs in the background on our servers
3. Automated correcIon and feedback through Submission Correctness Tests (SCT)
4. More fun through gamificaIon of the learning process
LIVE DEMO Surf to
hNp://beta.datamind.org
Exercises versus Challenges
1. Read challenge
2. Type code to solve the challenge
3. Get result on certain metric
4. Get ranked on the leaderboard
5. Possibility to improve your code
6. Learn from others’ soluIons
1. Read exercise descripIon
2. Read instrucIons
3. Type code to solve the Exercise
4. Get personalized feedback on
the correctness of your soluIon
• For example: • Forecast R usage in next month
Metric = accuracy of forecast • Find most efficient way to calculate
certain parameter of a model Metric = Sme to compute
• …
Technical overview DataMind IT architecture
R Open-‐source staIsIcal language
DataMind leverages state of the art open-source frameworks in the cloud
• Scaling • Automated • Affordable
• Scalable • Plug & Play • Easy
R serve
Ruby on Rails High producIvity web applicaIon framework
Node.js Pla,orm for real-‐Ime scalable network applicaIons
R Open-‐source staIsIcal language
DataMind leverages state of the art open-source frameworks in the cloud
WebSockets
AJAX requests
R serve
Ruby on Rails High producIvity web applicaIon framework
Node.js Pla,orm for real-‐Ime scalable network applicaIons
RESTful API
R Open-‐source staIsIcal language
Angular.js MVC JavaScript framework for single-‐page applicaIons, maintained by Google
DataMind leverages state of the art open-source frameworks in the cloud
Rserve: Communication with R
• Package of Simon Urbanek • Manages sessions and workspaces
• Binary communicaIon • Emulate console with capture.output() • Detect incomplete statements with parse() • Catch and print errors
RAppArmor: Security
• EvaluaIon of external code è Huge security risk • SoluIon:
• Limited access to OS • RAppArmor
• Package of Jeroen Ooms • R-‐interface to OS Security • Limit CPU, Memory, Spawned processes
Course creators experience on DataMind
Benefits for course creation
1. Save Time! 1. Automated correcIon of student exercises 2. Efficient way to get feedback from course takers 3. Scalable distribuIon of course content
2. Visibility for your package / courses 3. Insights in your course 4. Per student tracking
1. Number of aWempts per exercise 2. Use of “hint” and “soluIon” 3. Time to complete per exercise
5. Possibility to use courses/exercises from other creators
How to create courses We want your feedback!
1. Write the Assignment
How to create courses We want your feedback!
1. Assignment 2. Pre-‐exercise code 3. Sample code to help the student 4. Sample soluIon 5. Submission Correctness Test
2. Provide instruc5ons to student
How to create courses We want your feedback!
1. Assignment 2. Pre-‐exercise code 3. Sample code to help the student 4. Sample soluIon 5. Submission Correctness Test
3. Provide sample code to help student geZng started
How to create courses We want your feedback!
1. Assignment 2. Pre-‐exercise code 3. Sample code to help the student 4. Sample soluIon 5. Submission Correctness Test
4. Pre-‐exercise code is run in the background to pre-‐load a dataset,
graphs, etc.
How to create courses We want your feedback!
5. Provide sample solu5on
How to create courses We want your feedback!
6. Write Submission Correctness Test wriNen in R that checks the input of the student and returns
feedback
Submission Correctness Tests (examples)
Submission Correctness Tests (SCT)
A Submission Correctness Test checks the input from a student and returns (i) whether the student’s input was correct and (ii) feedback to student. • These tests are wriWen in R • Should be easy for a course creator
-‐> started developing an R package DataMind package to aid course creators to write simple tests*
*hWps://github.com/jonathancornelissen/DM
"Mistakes are not errors but parSally correct soluSons with underlying logic."
1. Assignment to student: x should be 5
2. Student types: x <- 4
3. Submission Correctness Test: if( x == 5 ){
DM.result <- list(TRUE, “Well done, you genius!”) }else{
DM.result <- list(FALSE, “Please assign 5 to x”) }
4. Output to student “Please assign 5 to x”
Simple Submission Correctness Tests (SCT)
1. Assignment to student: x should be 5
2. Student types: x <- 5
3. Submission Correctness Test: if( x == 5 ){
DM.result <- list(TRUE, “Well done, you genius!”) }else{
DM.result <- list(FALSE, “Please assign 5 to x”) }
4. Output to student “Well done, you genius!”
Simple Submission Correctness Tests (SCT)
• Everything in the student’s workspace
• DM.user.code all code wri?en by student
• DM.console.output everything printed to user console
• DM.errors errors generated when running students code
INPUT
Automated exercise correction with SCT Assignment to the student: Print a matrix with 3 rows containing the numbers 1 up to 9 If Student does this correctly then: DM.console.ouput contains
[,1] [,2] [,3] [1,] 1 2 3 [2,] 4 5 6 [3,] 7 8 9
• Everything in the student’s workspace
• DM.user.code all code wri?en by student
• DM.console.output everything printed to user console
• DM.errors errors generated when running students code
INPUT
Automated exercise correction with SCT
Submission Correctness Test wriNen by course creator (poten5ally using DM package)
Assignment to the student: Print a matrix with 3 rows containing the numbers 1 up to 9 If Student does this correctly then: DM.console.ouput contains
[,1] [,2] [,3] [1,] 1 2 3 [2,] 4 5 6 [3,] 7 8 9
DM.result <- DM.outputContains("matrix(1:9, byrow=TRUE, nrow=3)”)
• Everything in the student’s workspace
• DM.user.code all code wri?en by student
• DM.console.output everything printed to user console
• DM.errors errors generated when running students code
INPUT
Automated exercise correction with SCT
Submission Correctness Test wriNen by course creator (poten5ally using DM package)
• Assigned to variable DM.result • List with two elements
1. TRUE / FALSE 2. Message to provide to student with
feedback
OUTPUT
Assignment to the student: Print a matrix with 3 rows containing the numbers 1 up to 9 If Student does this correctly then: DM.console.ouput contains
[,1] [,2] [,3] [1,] 1 2 3 [2,] 4 5 6 [3,] 7 8 9
DM.result <- DM.outputContains("matrix(1:9, byrow=TRUE, nrow=3)”)
DM. result is shown to student
SCT enable wide variety of options
• Has the student esImated a certain model correctly? • Generated a transformed Ime series that fulfills certain
condiIons? • Generated a certain type of graph ? • Forecasted a metric of interest within certain bounds? • …
Albert Jorissen
Martijn Theuwissen
Dieter De Mesmaeker
Jonathan Cornelissen
Want to help us to build a community !for learning and teaching R online?
Contact us!!
Q&A QuesIons and Answers
Filled out by 286 Academics, professionals and students from around the globe.
Majority of respondents interested in free interacIve courses
Most package authors willing to create free interacIve tutorials
Full data set of the survey and discussion of results at www.datamind.org/survey
Survey on R and education to verify interest of community