introduction to data analysis with r

6
Introduction to Data Analysis with R Dani Solà

Upload: dani-sola-lagares

Post on 14-Jul-2015

1.363 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Introduction to Data Analysis with R

Introduction to Data Analysis with R

Dani Solà

Page 2: Introduction to Data Analysis with R

What is R?● “R is a language and environment for statistical

computing and graphics”

● Paradigms: array, object-oriented, imperative,

functional, procedural, reflective

● Everything resides in memory (no big data)

● Easy to get started!

Page 3: Introduction to Data Analysis with R

Why R?● Free Software (GNU General Public License)

● Mature, v1.0 released on 2000

● Widely used

● Good documentation and manuals

● Lots of freely available packages

● Excellent graphic capabilities

Page 4: Introduction to Data Analysis with R

Getting the data (CSV)● MySQL

● Hive + sed

● Consider sampling!

SELECT * INTO OUTFILE '/path/to/file.csv'FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'ESCAPED BY ‘\\’LINES TERMINATED BY '\n'FROM table WHERE <condition>;

INSERT OVERWRITE LOCAL DIRECTORY '/tmp_path/'SELECT * FROM table WHERE <condition>;

cat /tmp_path/* | sed 's/[Ctrl-V][Ctrl-A]/\t/g' > out.txt

Page 5: Introduction to Data Analysis with R

Linear Regressiony=α+β x

α= y−β x

β=∑i=1

n(xi− x)( y i− y)

∑i=1

n(x i− x)

2=Cov [ x , y ]Var [ x ]

Just use lm() in R!

(But check the assumptions)

Page 6: Introduction to Data Analysis with R

Want more?● Computing for Data Analysis – Roger D. Peng

www.coursera.org/course/compdata

● Statistics One – Andrew Conway

www.coursera.org/course/stats1

● An Introduction to R – The R Core Team

cran.r-project.org/doc/manuals/r-release/R-intro.pdf