using r for predictive analytics - meetupfiles.meetup.com/1503964/2009oct20 pafka using r for...

9
Using R for Predictive Analytics Szil´ ard Pafka Predictive Analytics World / DC useR Group October 20, 2009

Upload: dangthuy

Post on 06-Jun-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Using R for Predictive Analytics - Meetupfiles.meetup.com/1503964/2009Oct20 Pafka Using R for Predictive... · Using R for Predictive Analytics ... modeling visualization Predictive

Using R for Predictive Analytics

Szilard Pafka

Predictive Analytics World / DC useR Group

October 20, 2009

Page 2: Using R for Predictive Analytics - Meetupfiles.meetup.com/1503964/2009Oct20 Pafka Using R for Predictive... · Using R for Predictive Analytics ... modeling visualization Predictive

Introduction and agenda

Introduction:

• physics, finance, statistical modeling (PhD)

• data mining at a credit card processor (Chief Scientist)

• co-organizing the Los Angeles area R users group

Agenda:

• using R for predictive analytics

• benefits from R users groups

Page 3: Using R for Predictive Analytics - Meetupfiles.meetup.com/1503964/2009Oct20 Pafka Using R for Predictive... · Using R for Predictive Analytics ... modeling visualization Predictive

R and predictive analytics

R:

• data manipulation

• statistical analysis, exploratory data analysis

• computations, simulations

• modeling

• visualization

Predictive analytics (data mining for prediction problems):

• understanding the domain problem and related questions

• data exploration and cleaning

• building predictive models (supervised learning)

• model diagnostics, selection and evaluation

• deployment

Page 4: Using R for Predictive Analytics - Meetupfiles.meetup.com/1503964/2009Oct20 Pafka Using R for Predictive... · Using R for Predictive Analytics ... modeling visualization Predictive

R vs alternatives

R:

• open source• packages• latest cutting-edge methods

• command line interaction (vs GUI)• flexible• results easier to reproduce

• learning curve• books• R-help

• cost, multiple platforms

alternatives:

• spreadsheets

• programminglanguages

• data miningsoftware

• similar softwareenvironments

Page 5: Using R for Predictive Analytics - Meetupfiles.meetup.com/1503964/2009Oct20 Pafka Using R for Predictive... · Using R for Predictive Analytics ... modeling visualization Predictive

Data exploration

• data frames

• import from various sources (csv files, database connection)

• flexible data manipulation functionalities• subscripting• numerous transformations• text manipulation (e.g. regexp)• data aggregation• specialized packages (reshape, plyr etc.)

• graphics

• sample R code

Page 6: Using R for Predictive Analytics - Meetupfiles.meetup.com/1503964/2009Oct20 Pafka Using R for Predictive... · Using R for Predictive Analytics ... modeling visualization Predictive

Visualization

• powerful tool for data exploration, diagnostics etc.

• R graphics: base, lattice, ggplot2

• visual perception, principles of visual design

• ggplot2• based on the grammar of graphics• nice defaults• elements: mappings, geoms, stats, scales, facets• very expressive (sample code)

Page 7: Using R for Predictive Analytics - Meetupfiles.meetup.com/1503964/2009Oct20 Pafka Using R for Predictive... · Using R for Predictive Analytics ... modeling visualization Predictive

Building predictive models

• supervised learning: y = f (x1, x2, . . . , xp)

• regression (numeric y), classification (categorical y)

• spam detection example: classification (2 classes, Y/N)

• numerous methods: logistic regression, LDA, nearest neighbors,

decision trees, bagging, boosting, random forests, neural networks,

support vector machines etc.

• example: neural networks with nnet (sample code)• weight decay as regularization• helps convergence and against overfitting

Page 8: Using R for Predictive Analytics - Meetupfiles.meetup.com/1503964/2009Oct20 Pafka Using R for Predictive... · Using R for Predictive Analytics ... modeling visualization Predictive

Model evaluation

99% accuracy?

• package caret: unified interface/wrapper for• training models, tuning (complexity parameters)• diagnosing, selecting, evaluating models

• model deployment

• Conclusion: R is a viable tool for predictive analytics

Page 9: Using R for Predictive Analytics - Meetupfiles.meetup.com/1503964/2009Oct20 Pafka Using R for Predictive... · Using R for Predictive Analytics ... modeling visualization Predictive

R users groups

• R users groups around the world, in the US: SF, NY, LA, DC...

• Los Angeles: 150+ members, 4 meetings so far

• talks, Q&A, discussions, exchange of knowledge• pointing people where to look for (packages, docs etc.)• also more generally for statistical methodology issues

• networking opportunities