r-intro croped white border

Upload: hicquavidemur

Post on 07-Apr-2018

235 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/4/2019 R-Intro Croped White Border

    1/101

    An Introduction to RNotes on R: A Programming Environment or Data Analysis and Graphics

    Version 2.13.1 (2011-07-08)

  • 8/4/2019 R-Intro Croped White Border

    2/101

    Copyright c 1990 W. N. VenablesCopyright c 1992 W. N. Venables & D. M. SmithCopyright c 1997 R. Gentleman & R. IhakaCopyright c 1997, 1998 M. MaechlerCopyright c 1997 R Core Development Team

    Copyright c 19992010 R Development Core Team

    Permission is granted to make and distribute verbatim copies o this manual provided the copy-right notice and this permission notice are preserved on all copies.

    Permission is granted to copy and distribute modied versions o this manual under the condi-tions or verbatim copying, provided that the entire resulting derived work is distributed underthe terms o a permission notice identical to this one.

    Permission is granted to copy and distribute translations o this manual into another language,under the above conditions or modied versions, except that this permission notice may bestated in a translation approved by the R Development Core Team.

    ISBN 3-900051-12-7

  • 8/4/2019 R-Intro Croped White Border

    3/101

    i

    Table o Contents

    Preace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

    1 Introduction and preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.1 The R environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Related sotware and documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 R and statistics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 R and the window system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.5 Using R interactively . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.6 An introductory session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.7 Getting help with unctions and eatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

    1.8 R commands, case sensitivity, etc. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.9 Recall and correction o previous commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.10 Executing commands rom or diverting output to a le. . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.11 Data permanency and removing objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    2 Simple manipulations; numbers and vectors. . . . . . . . . . . . . . . . . 72.1 Vectors and assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

  • 8/4/2019 R-Intro Croped White Border

    4/101

    ii

    5.7.2 Linear equations and inversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225.7.3 Eigenvalues and eigenvectors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235.7.4 Singular value decomposition and determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235.7.5 Least squares tting and the QR decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

    5.8 Forming partitioned matrices, cbind() and rbind() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245.9 The concatenation unction, c(), with arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245.10 Frequency tables rom actors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

    6 Lists and data rames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266.1 Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266.2 Constructing and modiying lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

    6.2.1 Concatenating lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276.3 Data rames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

    6.3.1 Making data rames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276.3.2 attach() and detach() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276.3.3 Working with data rames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

    6.3.4 Attaching arbitrary lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286.3.5 Managing the search path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

    7 Reading data rom fles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307.1 The read.table() unction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307.2 The scan() unction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 3 Accessing builtin datasets 31

  • 8/4/2019 R-Intro Croped White Border

    5/101

    iii

    11 Statistical models in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5011.1 Dening statistical models; ormulae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

    11.1.1 Contrasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5211.2 Linear models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5311.3 Generic unctions or extracting model inormation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5311.4 Analysis o variance and model comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

    11.4.1 ANOVA tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5411.5 Updating tted models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5411.6 Generalized linear models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

    11.6.1 Families . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

    11.6.2 The glm() unction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5611.7 Nonlinear least squares and maximum likelihood models . . . . . . . . . . . . . . . . . . . . . . . . . 58

    11.7.1 Least squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5811.7.2 Maximum likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

    11.8 Some non-standard models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

    12 Graphical procedures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    6212.1 High-level plotting commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

    12.1.1 The plot() unction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6212.1.2 Displaying multivariate data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6312.1.3 Display graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6312.1.4 Arguments to high-level plotting unctions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

    12.2 Low-level plotting commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

  • 8/4/2019 R-Intro Croped White Border

    6/101

    iv

    Appendix C The command-line editor . . . . . . . . . . . . . . . . . . . . . . . 88C.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88C.2 Editing actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88C.3 Command-line editor summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

    Appendix D Function and variable index . . . . . . . . . . . . . . . . . . . . 90

    Appendix E Concept index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

    Appendix F Reerences. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    95

  • 8/4/2019 R-Intro Croped White Border

    7/101

    Preace 1

    Preace

    This introduction to R is derived rom an original set o notes describing the S and S-Plusenvironments written in 19902 by Bill Venables and David M. Smith when at the Universityo Adelaide. We have made a number o small changes to reect diferences between the R andS programs, and expanded some o the material.

    We would like to extend warm thanks to Bill Venables (and David Smith) or granting

    permission to distribute this modied version o the notes in this way, and or being a supportero R rom way back.

    Comments and corrections are always welcome. Please address email correspondence [email protected].

    Suggestions to the reader

    Most R novices will start with the introductory session in Appendix A. This should give someamiliarity with the style o R sessions and more importantly some instant eedback on whatactually happens.

    Many users will come to R mainly or its graphical acilities. In this case, Chapter 12[Graphics], page 62 on the graphics acilities can be read at almost any time and need not waituntil all the preceding sections have been digested.

    C

    mailto:[email protected]:[email protected]
  • 8/4/2019 R-Intro Croped White Border

    8/101

    Chapter 1: Introduction and preliminaries 2

    1 Introduction and preliminaries

    1.1 The R environment

    R is an integrated suite o sotware acilities or data manipulation, calculation and graphicaldisplay. Among other things it has

    an efective data handling and storage acility,

    a suite o operators or calculations on arrays, in particular matrices,

    a large, coherent, integrated collection o intermediate tools or data analysis,

    graphical acilities or data analysis and display either directly at the computer or on hard-copy, and

    a well developed, simple and efective programming language (called S) which includes

    conditionals, loops, user dened recursive unctions and input and output acilities. (Indeedmost o the system supplied unctions are themselves written in the S language.)

    The term environment is intended to characterize it as a ully planned and coherent system,rather than an incremental accretion o very specic and inexible tools, as is requently thecase with other data analysis sotware.

    R is very much a vehicle or newly developing methods o interactive data analysis It hasd l d idl d h b t d d b l ll ti k H t Ch 1 I d i d li i i 3

  • 8/4/2019 R-Intro Croped White Border

    9/101

    Chapter 1: Introduction and preliminaries 3

    There is an important diference in philosophy between S (and hence R) and the other

    main statistical systems. In S a statistical analysis is normally done as a series o steps, withintermediate results being stored in objects. Thus whereas SAS and SPSS will give copiousoutput rom a regression or discriminant analysis, R will give minimal output and store theresults in a t object or subsequent interrogation by urther R unctions.

    1.4 R and the window system

    The most convenient way to use R is at a graphics workstation running a windowing system.This guide is aimed at users who have this acility. In particular we will occasionally reer tothe use o R on an X window system although the vast bulk o what is said applies generally toany implementation o the R environment.

    Most users will nd it necessary to interact directly with the operating system on theircomputer rom time to time. In this guide, we mainly discuss interaction with the operating

    system on UNIX machines. I you are running R under Windows or Mac OS you will need tomake some small adjustments.

    Setting up a workstation to take ull advantage o the customizable eatures o R is a straight-orward i somewhat tedious procedure, and will not be considered urther here. Users in di-culty should seek local expert help.

    1 5 Using R interactively Ch t 1 I t d ti d li i i 4

  • 8/4/2019 R-Intro Croped White Border

    10/101

    Chapter 1: Introduction and preliminaries 4

    1.6 An introductory session

    Readers wishing to get a eel or R at a computer beore proceeding are strongly advised to workthrough the introductory session given in Appendix A [A sample session], page 78.

    1.7 Getting help with unctions and eatures

    R has an inbuilt help acility similar to the man acility o UNIX. To get more inormation on

    any specic named unction, or examplesolve

    , the command is> help(solve)

    An alternative is

    > ?solve

    For a eature specied by special characters, the argument must be enclosed in double or singlequotes, making it a character string: This is also necessary or a ew words with syntactic

    meaning including if, for and function.

    > help("[[")

    Either orm o quote mark may be used to escape the other, as in the string "Itsimportant". Our convention is to use double quote marks or preerence.

    On most R installations help is available in HTML ormat by running

    > help.start() Ch t 1 I t d ti d li i i 5

  • 8/4/2019 R-Intro Croped White Border

    11/101

    Chapter 1: Introduction and preliminaries 5

    An assignment also evaluates an expression and passes the value to a variable but the result is

    not automatically printed.Commands are separated either by a semi-colon (;), or by a newline. Elementary commands

    can be grouped together into one compound expression by braces ({ and }). Comments canbe put almost2 anywhere, starting with a hashmark (#), everything to the end o the line is acomment.

    I a command is not complete at the end o a line, R will give a diferent prompt, by deault

    +

    on second and subsequent lines and continue to read input until the command is syntacticallycomplete. This prompt may be changed by the user. We will generally omit the continuationprompt and indicate continuation by simple indenting.

    Command lines entered at the console are limited3 to about 4095 bytes (not characters).

    1.9 Recall and correction o previous commands

    Under many versions o UNIX and on Windows, R provides a mechanism or recalling and re-executing previous commands. The vertical arrow keys on the keyboard can be used to scrollorward and backward through a command history. Once a command is located in this way, thecursor can be moved within the command using the horizontal arrow keys, and characters can

    be removed with the DEL key or added with the other keys More details are provided later: Chapter 1: Introduction and preliminaries 6

  • 8/4/2019 R-Intro Croped White Border

    12/101

    Chapter 1: Introduction and preliminaries 6

    > objects()

    (alternatively, ls()) can be used to display the names o (most o) the objects which are currentlystored within R. The collection o objects currently stored is called the workspace.

    To remove objects the unction rm is available:

    > rm(x, y, z, ink, junk, temp, foo, bar)

    All objects created during an R session can be stored permanently in a le or use in utureR sessions. At the end o each R session you are given the opportunity to save all the currentlyavailable objects. I you indicate that you want to do this, the objects are written to a le called.RData5 in the current directory, and the command lines used in the session are saved to a lecalled .Rhistory.

    When R is started at later time rom the same directory it reloads the workspace rom thisle. At the same time the associated commands history is reloaded.

    It is recommended that you should use separate working directories or analyses conductedwith R. It is quite common or objects with names x and y to be created during an analysis.Names like this are oten meaningul in the context o a single analysis, but it can be quitehard to decide what they might be when the several analyses have been conducted in the samedirectory.

    Chapter 2: Simple manipulations; numbers and vectors 7

  • 8/4/2019 R-Intro Croped White Border

    13/101

    Chapter 2: Simple manipulations; numbers and vectors 7

    2 Simple manipulations; numbers and vectors

    2.1 Vectors and assignment

    R operates on named data structures. The simplest such structure is the numeric vector, whichis a single entity consisting o an ordered collection o numbers. To set up a vector named x,say, consisting o ve numbers, namely 10.4, 5.6, 3.1, 6.4 and 21.7, use the R command

    > x

  • 8/4/2019 R-Intro Croped White Border

    14/101

    Chapter 2: Simple manipulations; numbers and vectors 8

    and so on, all have their usual meaning. max and min select the largest and smallest elements o a

    vector respectively. range is a unction whose value is a vector o length two, namely c(min(x),max(x)). length(x) is the number o elements in x, sum(x) gives the total o the elements inx, and prod(x) their product.

    Two statistical unctions are mean(x) which calculates the sample mean, which is the sameas sum(x)/length(x), and var(x) which gives

    sum((x-mean(x))^2)/(length(x)-1)

    or sample variance. I the argument to var() is an n-by-p matrix the value is a p-by-p samplecovariance matrix got by regarding the rows as independent p-variate sample vectors.

    sort(x) returns a vector o the same size as x with the elements arranged in increasing order;however there are other more exible sorting acilities available (see order() or sort.list()which produce a permutation to do the sorting).

    Note that max and min select the largest and smallest values in their arguments, even i theyare given several vectors. The parallel maximum and minimum unctions pmax and pmin returna vector (o length equal to their longest argument) that contains in each element the largest(smallest) element in that position in any o the input vectors.

    For most purposes the user will not be concerned i the numbers in a numeric vectorare integers, reals or even complex. Internally calculations are done as double precision real

    numbers or double precision complex numbers i the input data are complex Chapter 2: Simple manipulations; numbers and vectors 9

  • 8/4/2019 R-Intro Croped White Border

    15/101

    Chapter 2: Simple manipulations; numbers and vectors 9

    The th parameter may be named along=vector, which i used must be the only parameter,

    and creates a sequence 1, 2, ..., length(vector), or the empty sequence i the vector isempty (as it can be).

    A related unction is rep() which can be used or replicating an object in various complicatedways. The simplest orm is

    > s5 s6 temp 13 Chapter 2: Simple manipulations; numbers and vectors 10

  • 8/4/2019 R-Intro Croped White Border

    16/101

    Chapter 2: Simple manipulations; numbers and vectors 10

    > Inf - Inf

    which both give NaN since the result cannot be dened sensibly.In summary, is.na(xx) is TRUE both or NA and NaN values. To diferentiate these,

    is.nan(xx) is only TRUE or NaNs.

    Missing values are sometimes printed as when character vectors are printed withoutquotes.

    2.6 Character vectorsCharacter quantities and character vectors are used requently in R, or example as plot labels.Where needed they are denoted by a sequence o characters delimited by the double quotecharacter, e.g., "x-values", "New iteration results".

    Character strings are entered using either matching double (") or single () quotes, but areprinted using double quotes (or sometimes without quotes). They use C-style escape sequences,using \ as the escape character, so \\ is entered and printed as \\, and inside double quotes "is entered as \". Other useul escape sequences are \n, newline, \t, tab and \b, backspacesee?Quotes or a ull list.

    Character vectors may be concatenated into a vector by the c() unction; examples o theiruse will emerge requently.

    The paste() unction takes an arbitrary number o arguments and concatenates them one byi t h t t i A b i th t d i t h t Chapter 2: Simple manipulations; numbers and vectors 11

  • 8/4/2019 R-Intro Croped White Border

    17/101

    Chapter 2: Simple manipulations; numbers and vectors 11

    2. A vector of positive integral quantities. In this case the values in the index vector must lie

    in the set {1, 2, . . . , length(x)}. The corresponding elements o the vector are selectedand concatenated, in that order, in the result. The index vector can be o any length and theresult is o the same length as the index vector. For example x[6] is the sixth componento x and

    > x[1:10]

    selects the rst 10 elements o x (assuming length(x) is not less than 10). Also

    > c("x","y")[rep(c(1,2,2,1), times=4)]

    (an admittedly unlikely thing to do) produces a character vector o length 16 consisting o"x", "y", "y", "x" repeated our times.

    3. A vector of negative integral quantities. Such an index vector species the values to beexcluded rather than included. Thus

    > y fruit names(fruit)

  • 8/4/2019 R-Intro Croped White Border

    18/101

    p p p ;

    data rames are matrix-like structures, in which the columns can be o diferent types. Think

    o data rames as data matrices with one row per observational unit but with (possibly)both numerical and categorical variables. Many experiments are best described by datarames: the treatments are categorical but the response is numeric. See Section 6.3 [Datarames], page 27.

    unctions are themselves objects in R which can be stored in the projects workspace. Thisprovides a simple and convenient way to extend R. See Chapter 10 [Writing your own

    unctions], page 42.

    Chapter 3: Objects, their modes and attributes 13

  • 8/4/2019 R-Intro Croped White Border

    19/101

    p j ,

    3 Objects, their modes and attributes

    3.1 Intrinsic attributes: mode and length

    The entities R operates on are technically known as objects. Examples are vectors o numeric(real) or complex values, vectors o logical values and vectors o character strings. These areknown as atomic structures since their components are all o the same type, or mode, namely

    numeric1

    , complex, logical, character and raw.Vectors must have their values all o the same mode. Thus any given vector must be un-

    ambiguously either logical, numeric, complex, character or raw. (The only apparent exceptionto this rule is the special value listed as NA or quantities not available, but in act there areseveral types o NA). Note that a vector can be empty and still have a mode. For examplethe empty character string vector is listed as character(0) and the empty numeric vector as

    numeric(0).R also operates on objects called lists, which are o mode list. These are ordered sequences

    o objects which individually can be o any mode. lists are known as recursive rather thanatomic structures since their components can themselves be lists in their own right.

    The other recursive structures are those o mode unction and expression. Functions arethe objects that orm part o the R system along with similar user written unctions, which we

    discuss in some detail later Expressions as objects orm an advanced part o R which will not Chapter 3: Objects, their modes and attributes 14

  • 8/4/2019 R-Intro Croped White Border

    20/101

    3.2 Changing the length o an object

    An empty object may still have a mode. For example> e e[3] alpha length(alpha)

  • 8/4/2019 R-Intro Croped White Border

    21/101

    > winter

    will print it in data rame orm, which is rather like a matrix, whereas> unclass(winter)

    will print it as an ordinary list. Only in rather special situations do you need to use this acility,but one is when you are learning to come to terms with the idea o class and generic unctions.

    Generic unctions and classes will be discussed urther in Section 10.9 [Object orientation],page 48, but only briey.

    Chapter 4: Ordered and unordered actors 16

  • 8/4/2019 R-Intro Croped White Border

    22/101

    4 Ordered and unordered actors

    A actor is a vector object used to speciy a discrete classication (grouping) o the componentso other vectors o the same length. R provides both ordered and unordered actors. While thereal application o actors is with model ormulae (see Section 11.1.1 [Contrasts], page 52), wehere look at a specic example.

    4.1 A specifc exampleSuppose, or example, we have a sample o 30 tax accountants rom all the states and territorieso Australia1 and their individual state o origin is specied by a character vector o statemnemonics as

    > state statef

  • 8/4/2019 R-Intro Croped White Border

    23/101

    as i they were separate vector structures. The result is a structure o the same length as thelevels attribute o the actor containing the results. The reader should consult the help documentor more details.

    Suppose urther we needed to calculate the standard errors o the state income means. To dothis we need to write an R unction to calculate the standard error or any given vector. Sincethere is an builtin unction var() to calculate the sample variance, such a unction is a verysimple one liner, specied by the assignment:

    > stderr incster incster

    act nsw nt qld sa tas vic wa

    1.5 4.3102 4.5 4.1061 2.7386 0.5 5.244 2.6575

    As an exercise you may care to nd the usual 95% condence limits or the state meanincomes. To do this you could use tapply() once more with the length() unction to ndthe sample sizes, and the qt() unction to nd the percentage points o the appropriate t-distributions (You could also investigate Rs acilities or t tests ) Chapter 5: Arrays and matrices 18

  • 8/4/2019 R-Intro Croped White Border

    24/101

    5 Arrays and matrices

    5.1 Arrays

    An array can be considered as a multiply subscripted collection o data entries, or examplenumeric. R allows simple acilities or creating and handling arrays, and in particular the

    special case o matrices.A dimension vector is a vector o non-negative integers. I its length is k then the array is

    k-dimensional, e.g. a matrix is a 2-dimensional array. The dimensions are indexed rom one upto the values given in the dimension vector.

    A vector can be used by R as an array only i it has a dimension vector as its dim attribute.Suppose, or example, z is a vector o 1500 elements. The assignment

    > dim(z)

  • 8/4/2019 R-Intro Croped White Border

    25/101

    5.3 Index matrices

    As well as an index vector in any subscript position, a matrix may be used with a single indexmatrix in order either to assign a vector o quantities to an irregular collection o elements inthe array, or to extract an irregular collection as a vector.

    A matrix example makes the process clear. In the case o a doubly indexed array, an indexmatrix may be given consisting o two columns and as many rows as desired. The entries in theindex matrix are the row and column indices or the doubly indexed array. Suppose or example

    we have a 4 by 5 array X and we wish to do the ollowing: Extract elements X[1,3], X[2,2] and X[3,1] as a vector structure, and

    Replace these entries in the array X by zeroes.

    In this case we need a 3 by 2 subscript array, as in the ollowing example.

    > x x[,1] [,2] [,3] [,4] [,5]

    [1,] 1 5 9 13 17

    [2,] 2 6 10 14 18

    [3,] 3 7 11 15 19

    [4,] 4 8 12 16 20

    > i < array(c(1:3 3:1) dim=c(3 2)) Chapter 5: Arrays and matrices 20

  • 8/4/2019 R-Intro Croped White Border

    26/101

    However a simpler direct way o producing this matrix is to use table():

    > N Z Z Z

  • 8/4/2019 R-Intro Croped White Border

    27/101

    5.5 The outer product o two arrays

    An important operation on arrays is the outer product. I a and b are two numeric arrays,their outer product is an array whose dimension vector is obtained by concatenating their twodimension vectors (order is important), and whose data vector is got by orming all possibleproducts o elements o the data vector o a with those o b. The outer product is ormed bythe special operator %o%:

    > ab ab f z

  • 8/4/2019 R-Intro Croped White Border

    28/101

    5.7 Matrix acilities

    As noted above, a matrix is just an array with two subscripts. However it is such an importantspecial case it needs a separate discussion. R contains many operators and unctions that areavailable only or matrices. For example t(X) is the matrix transpose unction, as noted above.The unctions nrow(A) and ncol(A) give the number o rows and columns in the matrix Arespectively.

    5.7.1 Matrix multiplication

    The operator %*% is used or matrix multiplication. An n by 1 or 1 by n matrix may o coursebe used as an n-vector i in the context such is appropriate. Conversely, vectors which occur inmatrix multiplication expressions are automatically promoted either to row or column vectors,whichever is multiplicatively coherent, i possible, (although this is not always unambiguouslypossible, as we see later).

    I, or example, A and B are square matrices o the same size, then> A * B

    is the matrix o element by element products and

    > A % * % B

    is the matrix product. I x is a vector, then

    > %*% A %*% Chapter 5: Arrays and matrices 23

  • 8/4/2019 R-Intro Croped White Border

    29/101

    5.7.3 Eigenvalues and eigenvectors

    The unction eigen(Sm) calculates the eigenvalues and eigenvectors o a symmetric matrixSm. The result o this unction is a list o two components named values and vectors. Theassignment

    > ev evals eigen(Sm)

    is used by itsel as a command the two components are printed, with their names. For largematrices it is better to avoid computing the eigenvectors i they are not needed by using theexpression

    > evals

  • 8/4/2019 R-Intro Croped White Border

    30/101

    > Xplus b fit res X < bi d( 1 2 3 )

    Chapter 5: Arrays and matrices 25

  • 8/4/2019 R-Intro Croped White Border

    31/101

    5.10 Frequency tables rom actors

    Recall that a actor denes a partition into groups. Similarly a pair o actors denes a twoway cross classication, and so on. The unction table() allows requency tables to be calcu-lated rom equal length actors. I there are k actor arguments, the result is a k-way array orequencies.

    Suppose, or example, that statef is a actor giving the state code or each entry in a datavector. The assignment

    > statefr statefr factor(cut(incomes, breaks = 35+10*(0:7))) -> incomef

    Then to calculate a two-way table o requencies:

    > table(incomef,statef)

    statef

    i f t t ld t i

    Chapter 6: Lists and data rames 26

  • 8/4/2019 R-Intro Croped White Border

    32/101

    6 Lists and data rames

    6.1 Lists

    An R list is an object consisting o an ordered collection o objects known as its components.

    There is no particular need or the components to be o the same mode or type, and, orexample, a list could consist o a numeric vector, a logical value, a matrix, a complex vector, a

    character array, a unction, and so on. Here is a simple example o how to make a list:

    > Lst

  • 8/4/2019 R-Intro Croped White Border

    33/101

    > Lst Lst[5] list.ABC

  • 8/4/2019 R-Intro Croped White Border

    34/101

    The attach() unction takes a database such as a list or data rame as its argument. Thussuppose lentils is a data rame with three variables lentils$u, lentils$v, lentils$w. Theattach

    > attach(lentils)

    places the data rame in the search path at position 2, and provided there are no variables u, vor w in position 1, u, v and w are available as variables rom the data rame in their own right.At this point an assignment such as

    > u lentils$u detach()

    More precisely, this statement detaches rom the search path the entity currently atposition 2. Thus in the present context the variables u, v and w would be no longer visible,

    Chapter 6: Lists and data rames 29

  • 8/4/2019 R-Intro Croped White Border

    35/101

    6.3.5 Managing the search path

    The unction search shows the current search path and so is a very useul way to keep track owhich data rames and lists (and packages) have been attached and detached. Initially it gives

    > search()

    [1] ".GlobalEnv" "Autoloads" "package:base"

    where .GlobalEnv is the workspace.1

    Ater lentils is attached we have

    > search()

    [1] ".GlobalEnv" "lentils" "Autoloads" "package:base"

    > ls(2)

    [1] "u" "v" "w"

    and as we see ls (or objects) can be used to examine the contents o any position on the searchpath.

    Finally, we detach the data rame and conrm it has been removed rom the search path.

    > detach("lentils")

    > search()

    [1] ".GlobalEnv" "Autoloads" "package:base"

    Chapter 7: Reading data rom les 30

  • 8/4/2019 R-Intro Croped White Border

    36/101

    7 Reading data rom fles

    Large data objects will usually be read as values rom external les rather than entered duringan R session at the keyboard. R input acilities are simple and their requirements are airlystrict and even rather inexible. There is a clear presumption by the designers o R that youwill be able to modiy your input les using other tools, such as le editors or Perl1 to t inwith the requirements o R. Generally this is very simple.

    I variables are to be held mainly in data rames, as we strongly suggest they should be, an

    entire data rame can be read directly with the read.table() unction. There is also a moreprimitive input unction, scan(), that can be called directly.

    For more details on importing data into R and also exporting data, see the R Data Im-port/Export manual.

    7.1 The read.table() unction

    To read an entire data rame directly, the external le will normally have a special orm.

    The rst line o the le should have a name or each variable in the data rame.

    Each additional line o the le has as its rst item a row label and the values or eachvariable.

    I the le has one ewer item in its rst line than in its second, this arrangement is presumed

    b i S h li l b d d i h l k ll

    Chapter 7: Reading data rom les 31

  • 8/4/2019 R-Intro Croped White Border

    37/101

    The data rame may then be read as

    > HousePrice inp label

  • 8/4/2019 R-Intro Croped White Border

    38/101

    7.4 Editing data

    When invoked on a data rame or matrix, edit brings up a separate spreadsheet-like environmentor editing. This is useul or making small changes once a data set has been read. The command

    > xnew

  • 8/4/2019 R-Intro Croped White Border

    39/101

    8 Probability distributions

    8.1 R as a set o statistical tables

    One convenient use o R is to provide a comprehensive set o statistical tables. Functions areprovided to evaluate the cumulative distribution unction P(X x), the probability densityunction and the quantile unction (given q, the smallest x such that P(X x) > q), and to

    simulate rom the distribution.

    Distribution R name additional arguments

    beta beta shape1, shape2, ncpbinomial binom size, probCauchy cauchy location, scale

    chi-squared chisq df, ncpexponential exp rateF f df1, df2, ncpgamma gamma shape, scalegeometric geom probhypergeometric hyper m, n, k

    Chapter 8: Probability distributions 34

  • 8/4/2019 R-Intro Croped White Border

    40/101

    8.2 Examining the distribution o a set o data

    Given a (univariate) set o data we can examine its distribution in a large number o ways. Thesimplest is to examine the numbers. Two slightly diferent summaries are given by summary andfivenum and a display o the numbers by stem (a stem and lea plot).

    > attach(faithful)

    > summary(eruptions)

    Min. 1st Qu. Median Mean 3rd Qu. Max.

    1.600 2.163 4.000 3.488 4.454 5.100

    > fivenum(eruptions)

    [1] 1.6000 2.1585 4.0000 4.4585 5.1000

    > stem(eruptions)

    The decimal point is 1 digit(s) to the left of the |

    16 | 070355555588

    18 | 000022233333335577777777888822335777888

    20 | 00002223378800035778

    |

    Chapter 8: Probability distributions 35

  • 8/4/2019 R-Intro Croped White Border

    41/101

    too much smoothing (it usually does or interesting densities). (Better automated methods obandwidth choice are available, and in this example bw = "SJ" gives a good result.)

    Histogram of eruptions

    eruptions

    RelativeFrequency

    1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

    0.0

    0

    .1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    Chapter 8: Probability distributions 36

  • 8/4/2019 R-Intro Croped White Border

    42/101

    which shows a reasonable t but a shorter right tail than one would expect rom a normaldistribution. Let us compare this with some simulated data rom a t distribution

    2 1 0 1 2

    3.0

    3.5

    4.0

    4.5

    5.0

    Normal QQ Plot

    Theoretical Quantiles

    SampleQuantiles

    x

  • 8/4/2019 R-Intro Croped White Border

    43/101

    Method A: 79.98 80.04 80.02 80.04 80.03 80.03 80.04 79.97

    80.05 80.03 80.02 80.00 80.02

    Method B: 80.02 79.94 79.98 79.97 79.97 80.03 79.95 79.97

    Boxplots provide a simple graphical comparison o the two samples.

    A

  • 8/4/2019 R-Intro Croped White Border

    44/101

    data: A and B

    F = 0.5837, num df = 12, denom df = 7, p-value = 0.3938alternative hypothesis: true ratio of variances is not equal to 1

    95 percent confidence interval:

    0.1251097 2.1052687

    sample estimates:

    ratio of variances

    0.5837405which shows no evidence o a signicant diference, and so we can use the classical t-test thatassumes equality o the variances.

    > t.test(A, B, var.equal=TRUE)

    Two Sample t-test

    data: A and B

    t = 3.4722, df = 19, p-value = 0.002551

    alternative hypothesis: true difference in means is not equal to 0

    95 percent confidence interval:

    0.01669058 0.06734788

    Chapter 8: Probability distributions 39

  • 8/4/2019 R-Intro Croped White Border

    45/101

    alternative hypothesis: two-sided

    Warning message:cannot compute correct p-values with ties in: ks.test(A, B)

    Chapter 9: Grouping, loops and conditional execution 40

  • 8/4/2019 R-Intro Croped White Border

    46/101

    9 Grouping, loops and conditional execution

    9.1 Grouped expressions

    R is an expression language in the sense that its only command type is a unction or expressionwhich returns a result. Even an assignment is an expression whose result is the value assigned,and it may be used wherever any expression may be used; in particular multiple assignmentsare possible.

    Commands may be grouped together in braces, {expr_1 ; ...; expr_m }, in which case thevalue o the group is the result o the last expression in the group evaluated. Since such a groupis also an expression it may, or example, be itsel included in parentheses and used a part o aneven larger expression, and so on.

    9.2 Control statements

    9.2.1 Conditional execution: if statements

    The language has available a conditional construction o the orm

    > i f (expr_1 ) expr_2 else expr_3

    where expr 1 must evaluate to a single logical value and the result o the entire expression is

    Chapter 9: Grouping, loops and conditional execution 41

  • 8/4/2019 R-Intro Croped White Border

    47/101

    Warning: for() loops are used in R code much less oten than in compiled languages.Code that takes a whole object view is likely to be both clearer and aster in R.

    Other looping acilities include the

    > repeat expr

    statement and the

    > while (condition ) expr

    statement.

    The break statement can be used to terminate any loop, possibly abnormally. This is theonly way to terminate repeat loops.

    The next statement can be used to discontinue one particular cycle and skip to the next.

    Control statements are most oten used in connection with unctions which are discussed inChapter 10 [Writing your own unctions], page 42, and where more examples will emerge.

    Chapter 10: Writing your own unctions 42

  • 8/4/2019 R-Intro Croped White Border

    48/101

    10 Writing your own unctions

    As we have seen inormally along the way, the R language allows the user to create objects omode unction. These are true R unctions that are stored in a special internal orm and may beused in urther expressions and so on. In the process, the language gains enormously in power,convenience and elegance, and learning to write useul unctions is one o the main ways to makeyour use o R comortable and productive.

    It should be emphasized that most o the unctions supplied as part o the R system, suchas mean(), var(), postscript() and so on, are themselves written in R and thus do not difermaterially rom user written unctions.

    A unction is dened by an assignment o the orm

    > name

  • 8/4/2019 R-Intro Croped White Border

    49/101

    The classical R unction lsfit() does this job quite well, and more1. It in turn uses theunctions qr() and qr.coef() in the slightly counterintuitive way above to do this part o the

    calculation. Hence there is probably some value in having just this part isolated in a simple touse unction i it is going to be in requent use. I so, we may wish to make it a matrix binaryoperator or even more convenient use.

    10.2 Defning new binary operators

    Had we given the bslash() unction a diferent name, namely one o the orm%anything%

    it could have been used as a binary operator in expressions rather than in unction orm. Suppose,or example, we choose ! or the internal character. The unction denition would then start as

    > "%!%"

  • 8/4/2019 R-Intro Croped White Border

    50/101

    10.4 The ... argument

    Another requent requirement is to allow one unction to pass on argument settings to another.For example many graphics unctions use the unction par() and unctions like plot() allow theuser to pass on graphical parameters to par() to control the graphical output. (See Section 12.4.1[The par() unction], page 67, or more details on the par() unction.) This can be done byincluding an extra argument, literally ..., o the unction, which may then be passed on. Anoutline example is given below.

    fun1

  • 8/4/2019 R-Intro Croped White Border

    51/101

    }

    It is numerically slightly better to work with the singular value decomposition on this occasionrather than the eigenvalue routines.

    The result o the unction is a list giving not only the eciency actors as the rst component,but also the block and variety canonical contrasts, since sometimes these give additional useulqualitative inormation.

    10.6.2 Dropping all names in a printed array

    For printing purposes with large matrices or arrays, it is oten useul to print them in close blockorm without the array names or numbers. Removing the dimnames attribute will not achievethis efect, but rather the array must be given a dimnames attribute consisting o empty strings.For example to print a matrix, X

    > temp dimnames(temp) temp; rm(temp)

    This can be much more conveniently done using a unction, no.dimnames(), shown below,as a wrap around to achieve the same result. It also illustrates how some efective and useuluser unctions can be quite short.

    no.dimnames

  • 8/4/2019 R-Intro Croped White Border

    52/101

    h

  • 8/4/2019 R-Intro Croped White Border

    53/101

    Dumped

    S > n < - 3

    S> cube(2)[1] 18

    ## then the same unction evaluated in RR> cube(2)

    [1] 8

    Lexical scope can also be used to give unctions mutable state. In the ollowing example

    we show how R can be used to mimic a bank account. A unctioning bank account needs tohave a balance or total, a unction or making withdrawals, a unction or making deposits anda unction or stating the current balance. We achieve this by creating the three unctionswithin account and then returning a list containing them. When account is invoked it takesa numerical argument total and returns a list containing the three unctions. Because theseunctions are dened in an environment which contains total, they will have access to its value.

    The special assignment operator,

  • 8/4/2019 R-Intro Croped White Border

    54/101

    ross$deposit(50)

    ross$balance()

    ross$withdraw(500)

    10.8 Customizing the environment

    Users can customize their environment in several diferent ways. There is a site initializationle and every directory can have its own special initialization le. Finally, the special unctions.First and .Last can be used.

    The location o the site initialization le is taken rom the value o the R_PROFILE environmentvariable. I that variable is unset, the le Rprofile.site in the R home subdirectory etcis used. This le should contain the commands that you want to execute every time R isstarted under your system. A second, personal, prole le named .Rprofile3 can be placedin any directory. I R is invoked in that directory then that le will be sourced. This legives individual users control over their workspace and allows or diferent startup procedures in

    diferent working directories. I no .Rprofile le is ound in the startup directory, then R looksor a .Rprofile le in the users home directory and uses that (i it exists). I the environmentvariable R_PROFILE_USER is set, the le it points to is used instead o the .Rprofile les.

    Any unction named .First() in either o the two prole les or in the .RData image hasa special status. It is automatically perormed at the beginning o an R session and may beused to initialize the environment. For example, the denition in the example below alters the

    Chapter 10: Writing your own unctions 49

  • 8/4/2019 R-Intro Croped White Border

    55/101

    or displaying objects graphically, summary() or summarizing analyses o various types, andanova() or comparing statistical models.

    The number o generic unctions that can treat a class in a specic way can be quite large.For example, the unctions that can accommodate in some ashion objects o class "data.frame"include

    [ [[ methods(plot)For many generic unctions the unction body is quite short, or example

    > coef

    function (object, ...)

    UseMethod("coef")

    The presence oUseMethod indicates this is a generic unction. To see what methods are available

    Chapter 11: Statistical models in R 50

  • 8/4/2019 R-Intro Croped White Border

    56/101

    11 Statistical models in R

    This section presumes the reader has some amiliarity with statistical methodology, in particularwith regression analysis and the analysis o variance. Later we make some rather more ambitiouspresumptions, namely that something is known about generalized linear models and nonlinearregression.

    The requirements or tting statistical models are suciently well dened to make it possibleto construct general tools that apply in a broad spectrum o problems.

    R provides an interlocking suite o acilities that make tting statistical models very simple.As we mention in the introduction, the basic output is minimal, and one needs to ask or thedetails by calling extractor unctions.

    11.1 Defning statistical models; ormulae

    The template or a statistical model is a linear regression model with independent, homoscedasticerrors

    yi =p

    j=0

    jxij + ei, ei NID(0, 2), i = 1, . . . , n

    In matrix terms this would be written

    Chapter 11: Statistical models in R 51

  • 8/4/2019 R-Intro Croped White Border

    57/101

    y ~ A Single classication analysis o variance model o y, with classes determined by A.

    y ~ A + x Single classication analysis o covariance model o y, with classes determined byA, and with covariate x.

    y ~ A*B

    y ~ A + B + A:B

    y ~ B %in% A

    y ~ A/B Two actor non-additive model o y on A and B. The rst two speciy the same

    crossed classication and the second two speciy the same nested classication. Inabstract terms all our speciy the same model subspace.

    y ~ (A + B + C)^2

    y ~ A*B*C - A:B:C

    Three actor experiment but with a model containing main efects and two actorinteractions only. Both ormulae speciy the same model.

    y ~ A * xy ~ A/x

    y ~ A/(1 + x) - 1

    Separate simple linear regression models o y on x within the levels o A, withdiferent codings. The last orm produces explicit estimates o as many diferentintercepts and slopes as there are levels in A.

    Chapter 11: Statistical models in R 52

  • 8/4/2019 R-Intro Croped White Border

    58/101

    M_1 - M_2 Include M 1 leaving out terms o M 2.

    M_1 : M_2 The tensor product oM 1 and M 2. I both terms are actors, then the subclassesactor.

    M_1 %in% M_2

    Similar to M_1 :M_2, but with a diferent coding.

    M_1 * M_2 M_1 + M_2 + M_1 :M_2.

    M_1 / M_2 M_1 + M_2 %in% M_1 .M n All terms in M together with interactions up to order n

    I(M) Insulate M. Inside M all operators have their normal arithmetic meaning, and thatterm appears in the model matrix.

    Note that inside the parentheses that usually enclose unction arguments all operators have

    their normal arithmetic meaning. The unction I() is an identity unction used to allow termsin model ormulae to be dened using arithmetic operators.

    Note particularly that the model ormulae speciy the columns o the model matrix, thespecication o the parameters being implicit. This is not the case in other contexts, or examplein speciying nonlinear models.

    Chapter 11: Statistical models in R 53

  • 8/4/2019 R-Intro Croped White Border

    59/101

    11.2 Linear models

    The basic unction or tting ordinary multiple models is lm(), and a streamlined version o thecall is as ollows:

    > fitted.model fm2

  • 8/4/2019 R-Intro Croped White Border

    60/101

    summary(object )

    Print a comprehensive summary o the results o the regression analysis.

    vcov(object )

    Returns the variance-covariance matrix o the main parameters o a tted modelobject.

    11.4 Analysis o variance and model comparison

    The model tting unction aov(formula, data=data.frame ) operates at the simplest level ina very similar way to the unction lm(), and most o the generic unctions listed in the table inSection 11.3 [Generic unctions or extracting model inormation], page 53 apply.

    It should be noted that in addition aov() allows an analysis o models with multiple errorstrata such as split plot experiments, or balanced incomplete block designs with recovery ointer-block inormation. The model ormula

    response ~ mean.formula + Error(strata.formula )species a multi-stratum experiment with error strata dened by the strata.ormula. In thesimplest case, strata.ormulais simply a actor, when it denes a two strata experiment, namelybetween and within the levels o the actor.

    For example, with all determining variables actors, a model ormula such as that in:

    > fm

  • 8/4/2019 R-Intro Croped White Border

    61/101

    would t a ve variate multiple regression with variables (presumably) rom the data rameproduction, t an additional model including a sixth regressor variable, and t a variant on

    the model where the response had a square root transorm applied.Note especially that i the data= argument is specied on the original call to the model

    tting unction, this inormation is passed on through the tted model object to update() andits allies.

    The name . can also be used in other contexts, but with slightly diferent meaning. For

    example> fmfull

  • 8/4/2019 R-Intro Croped White Border

    62/101

    11.6.1 Families

    The class o generalized linear models handled by acilities supplied in R includes gaussian,

    binomial, poisson, inverse gaussian and gamma response distributions and also quasi-likelihoodmodels where the response distribution is not explicitly specied. In the latter case the varianceunction must be specied as a unction o the mean, but in other cases this unction is impliedby the response distribution.

    Each response distribution admits a variety o link unctions to connect the mean with thelinear predictor. Those automatically available are shown in the ollowing table:

    Family name Link functions

    binomial logit, probit, log, clogloggaussian identity, log, inverseGamma identity, inverse, loginverse.gaussian 1/mu^2, identity, inverse, log

    poisson identity, log, sqrtquasi logit, probit, cloglog, identity, inverse,

    log, 1/mu^2, sqrt

    The combination o a response distribution, a link unction and various other pieces o inor-mation that are needed to carry out the modeling exercise is called the amily o the generalizedlinear model.

    Chapter 11: Statistical models in R 57

  • 8/4/2019 R-Intro Croped White Border

    63/101

    On the Aegean island o Kalythos the male inhabitants sufer rom a congenital eye disease,the efects o which become more marked with increasing age. Samples o islander males o

    various ages were tested or blindness and the results recorded. The data is shown below:Age: 20 35 45 55 70No. tested: 50 50 50 50 50No. blind: 6 17 26 37 44

    The problem we consider is to t both logistic and probit models to this data, and to estimateor each model the LD50, that is the age at which the chance o blindness or a male inhabitant

    is 50%.I y is the number o blind at age x and n the number tested, both models have the orm

    y B(n, F(0 + 1x))

    where or the probit case, F(z) = (z) is the standard normal distribution unction, and in thelogit case (the deault), F(z) = ez/(1 + ez). In both cases the LD50 is

    LD50 = 0/1

    that is, the point at which the argument o the distribution unction is zero.

    The rst step is to set the data up as a data rame

    > kalythos

  • 8/4/2019 R-Intro Croped White Border

    64/101

    > fmod

  • 8/4/2019 R-Intro Croped White Border

    65/101

    > fn plot(x, y)

    > xfit yfit lines(spline(xfit, yfit))

    We could do better, but these starting values o 200 and 0.1 seem adequate. Now do the t:> out sqrt(diag(2*out$minimum/(length(y) - 2) * solve(out$hessian)))

    The 2 in the line above represents the number o parameters. A 95% condence intervalwould be the parameter estimate 1.96 SE. We can superimpose the least squares t on a newplot:

    > plot(x, y)

    > xfit yfit

  • 8/4/2019 R-Intro Croped White Border

    66/101

    equivalently which minimize the negative log-likelihood. Here is an example rom Dobson (1990),pp. 108111. This example ts a logistic model to dose-response data, which clearly could also

    be t by glm(). The data are:> x y n fn out sqrt(diag(solve(out$hessian)))

    A 95% condence interval would be the parameter estimate 1.96 SE.

    11.8 Some non-standard models

    Chapter 11: Statistical models in R 61

  • 8/4/2019 R-Intro Croped White Border

    67/101

    Tree models are available in R via the user-contributed packages rpart and tree.

    Chapter 12: Graphical procedures 62

  • 8/4/2019 R-Intro Croped White Border

    68/101

    12 Graphical procedures

    Graphical acilities are an important and extremely versatile component o the R environment.It is possible to use the acilities to display a wide variety o statistical graphs and also to buildentirely new types o graph.

    The graphics acilities can be used in both interactive and batch modes, but in most cases,interactive use is more productive. Interactive use is also easy because at startup time R initiatesa graphics device driver which opens a special graphics window or the display o interactive

    graphics. Although this is done automatically, it is useul to know that the command used isX11() under UNIX, windows() under Windows and quartz() under Mac OS X.

    Once the device driver is running, R plotting commands can be used to produce a variety ographical displays and to create entirely new kinds o display.

    Plotting commands are divided into three basic groups:

    High-level plotting unctions create a new plot on the graphics device, possibly with axes,labels, titles and so on.

    Low-level plotting unctions add more inormation to an existing plot, such as extra points,lines and labels.

    Interactive graphics unctions allow you interactively add inormation to, or extract inor-mation rom, an existing plot, using a pointing device such as a mouse.

    Chapter 12: Graphical procedures 63

  • 8/4/2019 R-Intro Croped White Border

    69/101

    plot(df )

    plot(~ expr)

    plot(y ~ expr)d is a data rame, y is any object, expr is a list o object names separated by +(e.g., a + b + c). The rst two orms produce distributional plots o the variables ina data rame (rst orm) or o a number o named objects (second orm). The thirdorm plots y against every object named in expr.

    12.1.2 Displaying multivariate data

    R provides two very useul unctions or representing multivariate data. I X is a numeric matrixor data rame, the command

    > pairs(X)

    produces a pairwise scatterplot matrix o the variables dened by the columns o X, that is,every column o X is plotted against every other column o X and the resulting n(n 1) plots

    are arranged in a matrix with plot scales constant over the rows and columns o the matrix.When three or our variables are involved a coplot may be more enlightening. I a and b are

    numeric vectors and c is a numeric vector or actor object (all o the same length), then thecommand

    > coplot(a ~ b | c)

    produces a number o scatterplots o a against b or given values o c. I c is a actor, this

    Chapter 12: Graphical procedures 64

  • 8/4/2019 R-Intro Croped White Border

    70/101

    I the probability=TRUE argument is given, the bars represent relative requenciesdivided by bin width instead o counts.

    dotchart(x, ...)Constructs a dotchart o the data in x. In a dotchart the y-axis gives a labellingo the data in x and the x-axis gives its value. For example it allows easy visualselection o all data entries with values lying in specied ranges.

    image(x, y, z, ...)

    contour(x, y, z, ...)

    persp(x, y, z, ...)

    Plots o three variables. The image plot draws a grid o rectangles using diferentcolours to represent the value o z, the contour plot draws contour lines to representthe value o z, and the persp plot draws a 3D surace.

    12.1.4 Arguments to high-level plotting unctions

    There are a number o arguments which may be passed to high-level graphics unctions, asollows:

    add=TRUE Forces the unction to act as a low-level graphics unction, superimposing the ploton the current plot (some unctions only).

    axes=FALSE

    Chapter 12: Graphical procedures 65

  • 8/4/2019 R-Intro Croped White Border

    71/101

    12.2 Low-level plotting commands

    Sometimes the high-level plotting unctions dont produce exactly the kind o plot you desire.

    In this case, low-level plotting commands can be used to add extra inormation (such as points,lines or text) to the current plot.

    Some o the more useul low-level plotting unctions are:

    points(x, y)

    lines(x, y)

    Adds points or connected lines to the current plot. plot()s type= argument canalso be passed to these unctions (and deaults to "p" or points() and "l" orlines().)

    text(x, y, labels, ...)

    Add text to a plot at points given by x, y. Normally labels is an integer orcharacter vector in which case labels[i] is plotted at point (x[i], y[i]). The

    deault is 1:length(x).Note: This unction is oten used in the sequence

    > plot(x, y, type="n"); text(x, y, names)

    The graphics parameter type="n" suppresses the points but sets up the axes, andthe text() unction supplies special characters, as specied by the character vectornames or the points.

    Chapter 12: Graphical procedures 66

  • 8/4/2019 R-Intro Croped White Border

    72/101

    title(main, sub)

    Adds a title main to the top o the current plot in a large ont and (optionally) a

    sub-title sub at the bottom in a smaller ont.axis(side, ...)

    Adds an axis to the current plot on the side given by the rst argument (1 to 4,counting clockwise rom the bottom.) Other arguments control the positioning othe axis within or beside the plot, and tick positions and labels. Useul or addingcustom axes ater calling plot() with the axes=FALSE argument.

    Low-level plotting unctions usually require some positioning inormation (e.g., x and y co-ordinates) to determine where to place the new plot elements. Coordinates are given in terms ouser coordinates which are dened by the previous high-level graphics command and are chosenbased on the supplied data.

    Where x and y arguments are required, it is also sucient to supply a single argument being

    a list with elements named x and y. Similarly a matrix with two columns is also valid input.In this way unctions such as locator() (see below) may be used to speciy positions on a plotinteractively.

    12.2.1 Mathematical annotation

    In some cases, it is useul to add mathematical symbols and ormulae to a plot. This can be

    Chapter 12: Graphical procedures 67

  • 8/4/2019 R-Intro Croped White Border

    73/101

    locator(n, type)

    Waits or the user to select locations on the current plot using the let mouse button.

    This continues until n (deault 512) points have been selected, or another mousebutton is pressed. The type argument allows or plotting at the selected points andhas the same efect as or high-level graphics commands; the deault is no plotting.locator() returns the locations o the points selected as a list with two componentsx and y.

    locator() is usually called with no arguments. It is particularly useul or interactively

    selecting positions or graphic elements such as legends or labels when it is dicult to calculatein advance where the graphic should be placed. For example, to place some inormative textnear an outlying point, the command

    > text(locator(1), "Outlier", adj=0)

    may be useul. (locator() will be ignored i the current device, such as postscript does notsupport interactive pointing.)

    identify(x, y, labels)

    Allow the user to highlight any o the points dened by x and y (using the let mousebutton) by plotting the corresponding component o labels nearby (or the indexnumber o the point i labels is absent). Returns the indices o the selected pointswhen another button is pressed.

    Chapter 12: Graphical procedures 68

  • 8/4/2019 R-Intro Croped White Border

    74/101

    par() Without arguments, returns a list o all graphics parameters and their values orthe current device.

    par(c("col", "lty"))With a character vector argument, returns only the named graphics parameters(again, as a list.)

    par(col=4, lty=2)

    With named arguments (or a single list argument), sets the values o the namedgraphics parameters, and returns the original values o the parameters as a list.

    Setting graphics parameters with the par() unction changes the value o the parameterspermanently, in the sense that all uture calls to graphics unctions (on the current device) willbe afected by the new value. You can think o setting graphics parameters in this way assetting deault values or the parameters, which will be used by all graphics unctions unlessan alternative value is given.

    Note that calls to par() always afect the global values o graphics parameters, even whenpar() is called rom within a unction. This is oten undesirable behaviorusually we want toset some graphics parameters, do some plotting, and then restore the original values so as notto afect the users R session. You can restore the initial values by saving the result o par()when making changes, and restoring the initial values when plotting is complete.

    > oldpar

  • 8/4/2019 R-Intro Croped White Border

    75/101

    12.5.1 Graphical elements

    R plots are made up o points, lines, text and polygons (lled regions.) Graphical parameters

    exist which control how these graphical elements are drawn, as ollows:

    pch="+" Character to be used or plotting points. The deault varies with graphics drivers,but it is usually . Plotted points tend to appear slightly above or below theappropriate position unless you use "." as the plotting character, which producescentered points.

    pch=4 When pch is given as an integer between 0 and 25 inclusive, a specialized plottingsymbol is produced. To see what the symbols are, use the command

    > legend(locator(1), as.character(0:25), pch = 0:25)

    Those rom 21 to 25 may appear to duplicate earlier symbols, but can be colouredin diferent ways: see the help on points and its examples.

    In addition, pch can be a character or a number in the range 32:255 representinga character in the current ont.

    lty=2 Line types. Alternative line styles are not supported on all graphics devices (andvary on those that do) but line type 1 is always a solid line, line type 0 is always invis-ible, and line types 2 and onwards are dotted or dashed lines, or some combination

    Chapter 12: Graphical procedures 70

  • 8/4/2019 R-Intro Croped White Border

    76/101

    cex.axis

    cex.lab

    cex.maincex.sub The character expansion to be used or axis annotation, x and y labels, main andsub-titles, respectively.

    12.5.2 Axes and tick marks

    Many o Rs high-level plots have axes, and you can construct axes yoursel with the low-levelaxis() graphics unction. Axes have three main components: the axis line (line style controlledby the lty graphics parameter), the tick marks (which mark of unit divisions along the axisline) and the tick labels (which mark the units.) These components can be customized with theollowing graphics parameters.

    lab=c(5, 7, 12)

    The rst two numbers are the desired number o tick intervals on the x and y axesrespectively. The third number is the desired length o axis labels, in characters(including the decimal point.) Choosing a too-small value or this parameter mayresult in all tick labels being rounded to the same number!

    Chapter 12: Graphical procedures 71

    A t i l i

  • 8/4/2019 R-Intro Croped White Border

    77/101

    A typical gure is

    y

    3.0 1.5 0.0 1.5 3.0

    3.

    0

    1.

    5

    0.0

    1.

    5

    3.

    0

    Plot region

    mai[2]

    mar[3]

    Chapter 12: Graphical procedures 72

    12 5 4 Multiple fgure environment

  • 8/4/2019 R-Intro Croped White Border

    78/101

    12.5.4 Multiple fgure environment

    R allows you to create an n by m array o gures on a single page. Each gure has its own

    margins, and the array o gures is optionally surrounded by an outer margin, as shown in theollowing gure.

    mfg=c(3,2,3,2)

    i[1]

    omi[4]

    oma[3]

    Chapter 12: Graphical procedures 73

    (2 0 3 0)

  • 8/4/2019 R-Intro Croped White Border

    79/101

    oma=c(2, 0, 3, 0)

    omi=c(0, 0, 0.8, 0)

    Size o outer margins. Like mar and mai, the rst measures in text lines and thesecond in inches, starting with the bottom margin and working clockwise.

    Outer margins are particularly useul or page-wise titles, etc. Text can be added to the outermargins with the mtext() unction with argument outer=TRUE. There are no outer margins bydeault, however, so you must create them explicitly using oma or omi.

    More complicated arrangements o multiple gures can be produced by the split.screen()

    and layout() unctions, as well as by the grid and lattice packages.

    12.6 Device drivers

    R can generate graphics (o varying levels o quality) on almost any type o display or printingdevice. Beore this can begin, however, R needs to be inormed what type o device it is dealing

    with. This is done by starting a device driver. The purpose o a device driver is to convertgraphical instructions rom R (draw a line, or example) into a orm that the particular devicecan understand.

    Device drivers are started by calling a device driver unction. There is one such unctionor every device driver: type help(Devices) or a list o them all. For example, issuing thecommand

    Chapter 12: Graphical procedures 74

    > postscript("file ps" horizontal=FALSE height=5 pointsize=10)

  • 8/4/2019 R-Intro Croped White Border

    80/101

    > postscript("file.ps", horizontal=FALSE, height=5, pointsize=10)

    will produce a le containing PostScript code or a gure ve inches high, perhaps or inclusion

    in a document. It is important to note that i the le named in the command already exists,it will be overwritten. This is the case even i the le was only created earlier in the same Rsession.

    Many usages o PostScript output will be to incorporate the gure in another document. Thisworks best when encapsulated PostScript is produced: R always produces conormant output,but only marks the output as such when the onefile=FALSE argument is supplied. This unusual

    notation stems rom S-compatibility: it really means that the output will be a single page (whichis part o the EPSF specication). Thus to produce a plot or inclusion use something like

    > postscript("plot1.eps", horizontal=FALSE, onefile=FALSE,

    height=8, width=6, pointsize=10)

    12.6.2 Multiple graphics devices

    In advanced use o R it is oten useul to have several graphics devices in use at the same time.O course only one graphics device can accept graphics commands at any one time, and this isknown as the current device. When multiple devices are open, they orm a numbered sequencewith names giving the kind o device at any position.

    The main commands used or operating with multiple devices, and their meanings are asollows: Chapter 12: Graphical procedures 75

    dev off(k )

  • 8/4/2019 R-Intro Croped White Border

    81/101

    dev.off(k)

    Terminate the graphics device at point k o the device list. For some devices, such as

    postscript devices, this will either print the le immediately or correctly completethe le or later printing, depending on how the device was initiated.

    dev.copy(device, ..., which=k)

    dev.print(device, ..., which=k)

    Make a copy o the device k. Here device is a device unction, such as postscript,with extra arguments, i needed, specied by .... dev.print is similar, but the

    copied device is immediately closed, so that end actions, such as printing hardcopies,are immediately perormed.

    graphics.off()

    Terminate all graphics devices on the list, except the null device.

    12.7 Dynamic graphics

    R does not have builtin capabilities or dynamic or interactive graphics, e.g. rotating pointclouds or to brushing (interactively highlighting) points. However, extensive dynamic graphicsacilities are available in the system GGobi by Swayne, Cook and Buja available rom

    http://www.ggobi.org/

    and these can be accessed rom R via the package rggobi, described at

    Chapter 13: Packages 76

    13 Packages

    http://www.ggobi.org/http://www.ggobi.org/
  • 8/4/2019 R-Intro Croped White Border

    82/101

    13 Packages

    All R unctions and datasets are stored in packages. Only when a package is loaded are itscontents available. This is done both or eciency (the ull list would take more memory andwould take longer to search than a subset), and to aid package developers, who are protectedrom name clashes with other code. The process o developing packages is described in SectionCreating R packages in Writing R Extensions. Here, we will describe them rom a users pointo view.

    To see which packages are installed at your site, issue the command> library()

    with no arguments. To load a particular package (e.g., the boot package containing unctionsrom Davison & Hinkley (1997)), use a command like

    > library(boot)

    Users connected to the Internet can use the install.packages() and update.packages()

    unctions (available through the Packages menu in the Windows and RAqua GUIs, see SectionInstalling packages in R Installation and Administration) to install and update packages.

    To see which packages are currently loaded, use

    > search()

    to display the search list. Some packages may be loaded but not available on the search list (seeSectio 13 3 [Na es aces] age 76) these ill be i cl ded i the list gi e b Chapter 13: Packages 77

    For example t() is the transpose unction in R but users might dene their own unction

    http://r-exts.pdf/http://r-exts.pdf/http://r-exts.pdf/http://r-admin.pdf/http://r-admin.pdf/http://r-admin.pdf/http://r-admin.pdf/http://r-admin.pdf/http://r-exts.pdf/http://r-exts.pdf/
  • 8/4/2019 R-Intro Croped White Border

    83/101

    For example, t() is the transpose unction in R, but users might dene their own unctionnamed t. Namespaces prevent the users denition rom taking precedence, and breaking every

    unction that tries to transpose a matrix.There are two operators that work with namespaces. The double-colon operator :: selects

    denitions rom a particular namespace. In the example above, the transpose unction willalways be available as base::t, because it is dened in the base package. Only unctions thatare exported rom the package can be retrieved in this way.

    The triple-colon operator ::: may be seen in a ew places in R code: it acts like the

    double-colon operator but also allows access to hidden objects. Users are more likely to usethe getAnywhere() unction, which searches multiple packages.

    Packages are oten inter-dependent, and loading one may cause others to be automaticallyloaded. The colon operators described above will also cause automatic loading o the associatedpackage. When packages with namespaces are loaded automatically they are not added to thesearch list.

    Appendix A: A sample session 78

    Appendix A A sample session

  • 8/4/2019 R-Intro Croped White Border

    84/101

    Appendix A A sample session

    The ollowing session is intended to introduce to you some eatures o the R environment byusing them. Many eatures o the system will be unamiliar and puzzling at rst, but thispuzzlement will soon disappear.

    Login, start your windowing system.

    $ R Start R as appropriate or your platorm.

    The R program begins, with a banner.

    (Within R, the prompt on the let hand side will not be shown to avoid conusion.)

    help.start()

    Start the HTML interace to on-line help (using a web browser available at yourmachine). You should briey explore the eatures o this acility with the mouse.

    Iconiy the help window and move on to the next part.

    x

  • 8/4/2019 R-Intro Croped White Border

    85/101

    ( ( ), )

    Weighted regression line.

    detach() Remove data rame rom the search path.

    plot(fitted(fm), resid(fm),

    xlab="Fitted values",

    ylab="Residuals",

    main="Residuals vs Fitted")

    A standard regression diagnostic plot to check or heteroscedasticity. Can you see

    it?

    qqnorm(resid(fm), main="Residuals Rankit Plot")

    A normal scores plot to check or skewness, kurtosis and outliers. (Not very useulhere.)

    rm(fm, fm1, lrf, x, dummy)

    Clean up again.

    The next section will look at data rom the classical experiment o Michaelson and Morleyto measure the speed o light. This dataset is available in the morley object, but we will readit to illustrate the read.table unction.

    filepath

  • 8/4/2019 R-Intro Croped White Border

    86/101

    , y, , y y

    f is a square matrix, with rows and columns indexed by x and y respectively, ovalues o the unction cos(y)/(1 + x2).

    oldpar

  • 8/4/2019 R-Intro Croped White Border

    87/101

    pp g

    Users o R on Windows or Mac OS X should read the OS-specic section rst, but command-lineuse is also supported.

    B.1 Invoking R rom the command line

    When working at a command line on UNIX or Windows, the command R can be used both orstarting the main R program in the orm

    R [options] [outfle],or, via the R CMD interace, as a wrapper to various R tools (e.g., or processing les in Rdocumentation ormat or manipulating add-on packages) which are not intended to be calleddirectly.

    At the Windows command-line, Rterm.exe is preerred to R.

    You need to ensure that either the environment variable TMPDIR is unset or it points to avalid place to create temporary les and directories.

    Most options control what happens at the beginning and at the end o an R session. Thestartup mechanism is as ollows (see also the on-line help or topic Startup or more inorma-tion, and the section below or some Windows-specic details).

    Unless --no-environ was given, R searches or user and site les to process or settingi t i bl Th th it l i th i t d t b th i t

    Appendix B: Invoking R 82

    --encoding=enc

  • 8/4/2019 R-Intro Croped White Border

    88/101

    Speciy the encoding to be assumed or input rom the console or stdin. This needsto be an encoding known to iconv: see its help page. (--encoding enc is alsoaccepted.)

    RHOME Print the path to the R home directory to standard output and exit success-ully. Apart rom the ront-end shell script and the man page, R installation putseverything (executables, packages, etc.) into this directory.

    --save

    --no-saveControl whether data sets should be saved or not at the end o the R session. Ineither is given in an interactive session, the user is asked or the desired behaviorwhen ending the session with q(); in non-interactive use one o these must bespecied or implied by some other option (see below).

    --no-environ

    Do not read any user le to set environment variables.--no-site-file

    Do not read the site-wide prole at startup.

    --no-init-fileDo not read the users prole at startup.

    Appendix B: Invoking R 83

    Appendix C [The command-line editor], page 88, or more inormation. Command-

  • 8/4/2019 R-Intro Croped White Border

    89/101

    line editing is enabled by deault interactive use (see --interactive). This optionalso afects tilde-expansion: see the help or path.expand.

    --min-vsize=N--max-vsize=N

    Speciy the minimum or maximum amount o memory used or variable size objectsby setting the vector heap size to N bytes. Here, N must either be an integeror an integer ending with G, M, K, or k, meaning Giga (2^30), Mega (2^20),

    (computer) Kilo (2^10), or regular kilo (1000).

    --min-nsize=N--max-nsize=N

    Speciy the amount o memory used or xed size objects by setting the number ocons cells to N. See the previous option or details on N. A cons cell takes 28 byteson a 32-bit machine, and usually 56 bytes on a 64-bit machine.

    --max-ppsize=NSpeciy the maximum size o the pointer protection stack as N locations. Thisdeaults to 10000, but can be increased to allow large and complicated calculationsto be done. Currently the maximum value accepted is 100000.

    --max-mem-size=N Appendix B: Invoking R 84

    --gui=type

  • 8/4/2019 R-Intro Croped White Border

    90/101

    -g type (UNIX only) Use type as graphical user interace (note that this also includes in-teractive graphics). Currently, possible values or type are X11 (the deault) and,provided that Tcl/Tk support is available, Tk. (For back-compatibility, x11 andtk are accepted.)

    --arch=name (UNIX only) Run the specied sub-architecture. Most commonly used on Mac OSX, where the possible values are i386, x86_64 and ppc.

    --args This ag does nothing except cause the rest o the command line to be skipped:this can be useul to retrieve values rom it with commandArgs(TRUE).

    Note that input and output can be redirected in the usual way (using ), but theline length limit o 4095 bytes still applies. Warning and error messages are sent to the errorchannel (stderr).

    The command R CMD allows the invocation o various tools which are useul in conjunctionwith R, but not intended to be called directly. The general orm is

    R CMD command args

    where command is the name o the tool and args the arguments passed on to it.

    Currently, the ollowing tools are available.

    BATCH Run R in batch mode Runs R restore save ith possibl urther options (see Appendix B: Invoking R 85

    texify (Windows only) Process (La)TeX les with Rs style les

  • 8/4/2019 R-Intro Croped White Border

    91/101

    Use

    R CMD command --help

    to obtain usage inormation or each o the tools accessible via the R CMD interace.

    In addition, you can use2 options --arch=, --no-environ, --no-init-file,--no-site-file and --vanilla between R and CMD: these afect any R processes run by thetools. (Here --vanilla is equivalent to --no-environ --no-site-file --no-init-file.)However, note that R CMD does not o itsel use any R startup les (in particular, neither usernor site Renviron les), and all o the R processes run by these tools (except BATCH) use--no-restore. Most use --vanilla and so invoke no R startup les: the current exceptionsare INSTALL, REMOVE, Sweave and SHLIB (which uses --no-site-file --no-init-file).

    R CMD cmd args

    or any other executable cmd on the path or given by an absolute lepath: this is useul to have

    the same environment as R or the specic commands run under, or example to run ldd orpdflatex. Under Windows cmd can be an executable or a batch le, or i it has extension .shor .pl the appropriate interpreter (i available) is called to run it.

    B.2 Invoking R under Windows

    There are two ways to run R under Windows Within a terminal window (e g cmd exe or a more Appendix B: Invoking R 86

    --debug Enable the Break to debugger menu item in Rgui, and trigger a break to thed b d i d li i

  • 8/4/2019 R-Intro Croped White Border

    92/101

    debugger during command line processing.

    Under Windows with R CMD you may also speciy your own .bat, .exe, .sh or .plle. It will be run under the appropriate interpreter (Perl or .pl) with several environmentvariables set appropriately, including R_HOME, R_OSTYPE, PATH, BSTINPUTS and TEXINPUTS. Forexample, i you already have latex.exe on your path, then

    R CMD latex.exe mydoc

    will run LATEX o n mydoc.tex, with the path to Rs share/texmf macros appended to

    TEXINPUTS. (Unortunately, this does not help with the MiKTeX build o LATEX, but R CMDtexify mydoc will work in that case.)

    B.3 Invoking R under Mac OS X

    There are two ways to run R under Mac OS X. Within a Terminal.app window by invoking R,the methods described in the rst subsection apply. There is also console-based GUI (R.app)that by deault is installed in the Applications older on your system. It is a standard double-clickable Mac OS X application.

    The startup procedure under Mac OS X is very similar to that under UNIX. The homedirectory is the one inside the R.ramework, but the startup and current working directory areset as the users home directory unless a diferent startup directory is given in the Preerences

    i d ibl ithi th GUI Appendix B: Invoking R 87

    #! /usr/bin/env Rscript

  • 8/4/2019 R-Intro Croped White Border

    93/101

    ...

    At least in Bourne and bash shells, the #! mechanism does not allow extra arguments like #!/usr/bin/env Rscript --vanilla.

    One thing to consider is what stdin() reers to. It is commonplace to write R scripts withsegments like

    chem

  • 8/4/2019 R-Intro Croped White Border

    94/101

    C.1 Preliminaries

    When the GNU readline library is available at the time R is congured or compilation un-der UNIX, an inbuilt command line editor allowing recall, editing and re-submission o priorcommands is used. Note that other versions o readline exist and may be used by the inbuiltcommand line editor: this used to happen on Mac OS X.

    It can be disabled (useul or usage with ESS1) using the startup option --no-readline.

    Windows versions o R have somewhat simpler command-line editing: see Console under theHelp menu o the GUI, and the le README.Rterm or command-line editing under Rterm.exe.

    When using R with readline capabilities, the unctions described below are available, as wellas others (probably) documented in man readline or info readline on your system.

    Many o these use either Control or Meta characters. Control characters, such as Control-m,are obtained by holding the CTRL down while you press the M key, and are written as C-mbelow. Meta characters, such as Meta-b, are typed by holding down META2 and pressing B,and written as M-b in the ollowing. I your terminal does not have a META key enabled, youcan still type Meta characters using two-character sequences starting with ESC. Thus, to enter

    Appendix C: The command-line editor 89

    Horizontal motion o the cursor

  • 8/4/2019 R-Intro Croped White Border

    95/101

    C-a Go to the beginning o the command.

    C-e Go to the end o the line.

    M-b Go back one word.

    M-f Go orward one word.

    C-b Go back one character.

    C-f Go orward one character.

    On most terminals, you can also use the let and right arrow keys instead o C-b and C-f,respectively.

    Editing and re-submission

    text Inserttext

    at the cursor.C-f text Append text ater the cursor.

    DEL Delete the previous character (let o the cu