Part I: Introductory Materials Introduction to R

Download Part I: Introductory Materials Introduction to R

Post on 14-Jan-2016




0 download

Embed Size (px)


Part I: Introductory Materials Introduction to R. Dr. Nagiza F. Samatova Department of Computer Science North Carolina State University and Computer Science and Mathematics Division Oak Ridge National Laboratory. What is R and why do we use it?. - PowerPoint PPT Presentation


<ul><li><p>Part I: Introductory MaterialsIntroduction to RDr. Nagiza F. SamatovaDepartment of Computer ScienceNorth Carolina State UniversityandComputer Science and Mathematics DivisionOak Ridge National Laboratory</p></li><li><p>*What is R and why do we use it? Open source, most widely used for statistical analysis and graphics Extensible via dynamically loadable add-on packages &gt;1,800 packages on CRAN&gt; &gt; dyn.load( &gt; .C( foobar )&gt; dyn.unload( )&gt; v = rnorm(256)&gt; A = as.matrix (v,16,16)&gt; summary(A)&gt; library (fields)&gt; image.plot (A)</p></li><li><p>*Why R?</p></li><li><p>*The Programmers DilemmaWhat programming language to use &amp; why?</p></li><li><p>Features of R</p><p>R is an integrated suite of software for data manipulation, calculation, and graphical display</p><p>Effective data handlingVarious operators for calculations on arrays/matricesGraphical facilities for data analysisWell-developed language including conditionals, loops, recursive functions and I/O capabilities.</p></li><li><p>You can use R as a calculatorTyped expressions will be evaluated and printed outMain operations: +, -, *, /, ^Obeys order of operationsUse parentheses to group expressionsMore complex operations appear as functionssqrt(2)sin(pi/4), cos(pi/4), tan(pi/4), asin(1), acos(1), atan(1)exp(1), log(2), log10(10)Basic usage: arithmetic in R</p></li><li><p>*Getting helphelp(function_name)help(prcomp)?function_name? or ??topicSearch CRANhttp://www.r-project.orgFrom R GUI: Help Search helpCRAN Task Views (for individual packages)</p></li><li><p>Use variables to store valuesThree ways to assign variablesa = 6a aUpdate variables by using the current value in an assignmentx = x + 1Naming rulesCan include letters, numbers, ., and _Names are case sensitiveMust start with . or a letterVariables and assignment</p></li><li><p>R CommandsCommands can be expressions or assignmentsSeparate by semicolon or new lineCan split across multiple linesR will change prompt to + if command not finishedUseful commands for variablesls(): List all stored variablesrm(x): Delete one or more variablesclass(x): Describe what type of data a variable storessave(x,file=filename): Store variable(s) to a binary fileload(filename): Load all variables from a binary fileSave/load in current directory or My Documents by default</p></li><li><p>*Vectors and vector operations # c() command to create vector x x=c(12,32,54,33,21,65)# c() to add elements to vector x x=c(x,55,32)# seq() command to create sequence of number years=seq(1990,2003)# to contain in steps of .5 a=seq(3,5,.5)# can use : to step by 1years=1990:2003; # rep() command to create data that follow a regular pattern b=rep(1,5) c=rep(1:2,4)To create a vector:</p></li><li><p>*Matrices &amp; matrix operations </p></li><li><p>Find # of elements or dimensionslength(v), length(A), dim(A)Transposet(v), t(A)Matrix inversesolve(A)Sort vector valuessort(v)Statisticsmin(), max(), mean(), median(), sum(), sd(), quantile()Treat matrices as a single vector (same with sort())Useful functions for vectors and matrices</p></li><li><p>Most common plotting function is plot()plot(x,y) plots y vs xplot(x) plots x vs 1:length(x)plot() has many options for labels, colors, symbol, size, etc.Check help with ?plotUse points(), lines(), or text() to add to an existing plotUse x11() to start a new output windowSave plots with png(), jpeg(), tiff(), or bmp()Graphical display and plotting</p></li><li><p>R functions and datasets are organized into packagesPackages base and stats include many of the built-in functions in RCRAN provides thousands of packages contributed by R usersPackage contents are only available when loadedLoad a package with library(pkgname)Packages must be installed before they can be loadedUse library() to see installed packagesUse install.packages(pkgname) and update.packages(pkgname) to install or update a packageCan also run R CMD INSTALL pkgname.tar.gz from command line if you have downloaded package sourceR Packages</p></li><li><p>*Exploring the iris dataLoad iris data into your R session: data (iris);help (data);Check that iris was indeed loaded:ls ();Check the class that the iris object belongs to:class (iris);Read Sections 3.4 and 6.3 in Introduction to R Print the content of iris data:iris;Check the dimensions of the iris data:dim (iris);Check the names of the columns:names (iris);</p></li><li><p>*Exploring the iris data (cont.)Plot Petal.Length vs. Petal.Width: plot (iris[ , 3], iris[ , 4]);example(plot)Exercise: create a plot similar to this figure:</p><p>Src: Figure is from Introduction to Data Mining by Pang-Ning Tan, Michael Steinbach, and Vipin Kumar </p></li><li>Large data sets are better loaded through the file input interface in RReading a table of data can be done using the read.table() command:a </li><li><p>Programming in RThe following slides assume a basic understanding of programming concepts</p><p>For more information, please see chapters 9 and 10 of the R manual:</p><p>Additional resourcesBeginning R: An Introduction to Statistical Programming by Larry PaceIntroduction to R webpage on APSnet: R Inferno:*</p></li><li><p>Perform different commands in different situationsif (condition) command_if_trueCan add else command_if_false to endGroup multiple commands together with braces {}if (cond1) {cmd1; cmd2;} else if (cond2) {cmd3; cmd4;}Conditions use relational operators==, !=, , =Do not confuse = (assignment) with == (equality)= is a command, == is a questionCombine conditions with and (&amp;&amp;) and or (||)Use &amp; and | for vectors of length &gt; 1 (element-wise)Conditional statements</p></li><li><p>Most common type of loop is the for loopfor (x in v) { loop_commands; }v is a vector, commands repeat for each value in vVariable x becomes each value in v, in orderExample: adding the numbers 1-10total = 0; for (x in 1:10) total = total + x;Other type of loop is the while loopwhile (condition) { loop_commands; }Condition is identical to if statementCommands are repeated until condition is falseMight execute commands 0 times if already falsewhile loops are useful when you dont know number of iterationsLoops</p></li><li><p>Scripting in RA script is a sequence of R commands that perform some common taskE.g., defining a specific function, performing some analysis routine, etc.Save R commands in a plain text fileUsually have extension of .RRun scripts with source() :source(filename.R)To save command output to a file, use sink():sink(output.Rout)sink() restores output to consoleCan be used with or outside of a script</p></li><li>Objects containing an ordered collection of objectsComponents do not have to be of same typeUse list() to create a list:a </li><li>Writing functions in R is defined by an assignment like:a </li><li><p>*How do I get started with R (Linux)?Step 1: Download Rmkdir for RHOME; cd $RHOMEwget 2: Install Rtar zxvf R-2.9.1.tar.g./configure --prefix= --enable-R-shlib makemake installStep 3: Run RUpdate env. variables in $HOME/.bash_profile:export PATH=/bin:$PATHexport R_HOME= R</p></li><li><p>*Useful R linksR Home: Rs CRAN package distribution: Introduction to R manual: Writing R extensions: Other R documentation: </p><p>*Go to CRAN web-site and show the libraries**Do the head count of who uses each tool routinely or more preferentially.</p><p>There are other commercial software: S-PLUS, SAS</p><p>Much limited data mining tools: Weka, DAP, gretl</p><p>Matlab for matricesIDL for visualizationR for statistics, and open, extensible, large community</p><p>------------------------------------------Other Open Source Projects: Dap, gretl, Weka</p><p>Other Proprietary Projects: IDL, Matlab, Mathematica</p><p>Dap: a small statistics and graphics package based on C. Anyone familiar with the basic syntax of C programs can learn to use the C-style features of Dap quickly and easily; advanced features of C are not necessary, although they are available. </p><p>Gretl: Gnu Regression, Econometrics and Time-series Library. for econometric analysis </p><p>Weka: a collection of machine learning algorithms for data mining tasks. </p><p>*Low level languages can yield good performance, but often the productivity suffers. In contrast, high-level languages can offer good productivity, at the expense of performance. </p><p>Since many scientists use scripting environments, such as R, MATLAB and IDL, pR strives to bring the performance of low-level languages into high-level languages, specifically R, by enabling third-party compiled routines to execute transparently within R.</p><p>Examples of parallel third-party libraries: ScaLAPACK, PETSc</p><p>******Take examples from R tutorial(s)*Take examples from R tutorial(s)**********Update this slide</p></li></ul>