introduction on r

Upload: russelle-abrantes-arrienda

Post on 04-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 Introduction on R

    1/95

    8/29/12

  • 7/31/2019 Introduction on R

    2/95

    8/29/12

    package R is a free software environment forstatistical computing and graphics

    R is an integrated suite of softwarefacilities for data manipulation,simulation, calculation and graphical

    display. It handles and analyzes data very

    effectively and it contains a suite of

    operators for calculations on arraysand matrices.

  • 7/31/2019 Introduction on R

    3/95

    8/29/12

    Note:

    1. R is case sensitive2. all alphanumeric symbols are

    allowed plus . and _

    3. a name must start with . or a letter,and if it starts with . the secondcharacter must not be a digit

    example : Names starting with adigit is not accepted. You caninstead use .

  • 7/31/2019 Introduction on R

    4/95

    8/29/12

    3. Commands are separated either bya semi-colon (;), or by a newline.

    4. Comments can be put almost

    anywhere, starting with a hashmark(#), everything to the end of the lineis a comment

    5. Do not use names of variables in adata-frame as names of objects. If youdo so, the object will shadow the

    variable with the same name in

    Note:

  • 7/31/2019 Introduction on R

    5/95

    8/29/12

    For example, suppose the followingrepresents eight tosses of a fair die:

    2 5 1 6 5 5 4 1

    COMMAND:

    > dieroll dieroll

    [1] 2 5 1 6 5 5 4 1

    A Simple Example: the c()

    Function

  • 7/31/2019 Introduction on R

    6/95

    8/29/12

    When entering commands in R,

    you can save yourself a lot of typingwhen you learn to use the arrowkeys effectively. Each command

    you submit is stored in the Historyand the up arrow will navigatebackwards along this history and

    the down arrow forwards. The leftand right arrow keys movebackwards and forwards along the

    command line.

  • 7/31/2019 Introduction on R

    7/95

    8/29/12

    The Workspace

    All variables or objects created in Rare stored in whats called theworkspace. To see what variables are

    in the workspace, you can use thefunction ls() to list them (this functiondoesnt need any argument between

    the parentheses).

  • 7/31/2019 Introduction on R

    8/95

    8/29/12

    Currently, we only

    have:> ls()

    [1] "dieroll

    The Workspace

  • 7/31/2019 Introduction on R

    9/95

    8/29/12

    If we define a new variable a simplefunction of the variable dieroll it willbe added to the workspace:

    > newdieroll newdieroll

    [1] 1.0 2.5 0.5 3.0 2.5 2.5 2.00.5

    > ls()

    [1] "dieroll" "newdieroll"

    The Workspace

  • 7/31/2019 Introduction on R

    10/95

    8/29/12

    To remove objects from theworkspace (youll want to do thisoccasionally when your workspace gets

    too cluttered), use the rm() function:

    > rm(newdieroll) # this was a sillyvariable anyway

    > ls()

    [1] "dieroll"

    The Workspace

  • 7/31/2019 Introduction on R

    11/95

    8/29/12

    Get in the habit of savingyour work it will

    probably help you in thefuture.

  • 7/31/2019 Introduction on R

    12/95

    8/29/12

    Getting Help

    using the function help()

    > help(log)

  • 7/31/2019 Introduction on R

    13/95

    8/29/12

    Call help for matrix.

  • 7/31/2019 Introduction on R

    14/95

    8/29/12

    > a A A[,1] [,2] [,3] [,4]

    [1,] 1 3 5 7[2,] 2 4 6 8

  • 7/31/2019 Introduction on R

    15/95

    8/29/12

    > b B B

    [,1] [,2] [,3] [,4][1,] 2 6 10 14

  • 7/31/2019 Introduction on R

    16/95

    8/29/12

    > a A A

    [,1] [,2]

    [1,] 1 6

    [2,] 2 7

    [3,] 3 8[4,] 4 9

    [5,] 5 10

  • 7/31/2019 Introduction on R

    17/95

    8/29/12

    > B B

    [,1] [,2]

    [1,] 1 2

    [2,] 3 4

    [3,] 5 6[4,] 7 8

    [5,] 9 10

  • 7/31/2019 Introduction on R

    18/95

    8/29/12

    > C C[,1] [,2] [,3] [,4] [,5]

    [1,] 1 2 3 4 5

    [2,] 6 7 8 9 10

  • 7/31/2019 Introduction on R

    19/95

    8/29/12

    Exercises #1

    1. Use the help system to findinformation on the R functions meanand median.

    2. Get a list of all the functions in Rthat contains the string test.

    3. Create the vector myselfcontaininginfo of your age, height (in inches/cm),and phone number.

    4. Create the vector myfamilycontaining the names of your father,mother brothers & sisters.

    1 0 0

    0 1 0

    0 0 1

  • 7/31/2019 Introduction on R

    20/95

    8/29/12

    DataManagem

    ent

  • 7/31/2019 Introduction on R

    21/95

    8/29/12

    Data Management

    >myclassmates myclassmates[1] "Stephen""Christopher"

  • 7/31/2019 Introduction on R

    22/95

    8/29/12

    SequencesSometimes we will need to

    create a string of numericalvalues that have a regularpattern. Instead of typing thesequence out, we can define thepattern using some special

    operators and functions.1. Colon operator

    2. Sequence function seq()

  • 7/31/2019 Introduction on R

    23/95

    8/29/12

    Sequences

    The colon operator creates avector of numbers (between

    two specified numbers) thatare one unit apart:

    > 1:9

    [1] 1 2 3 4 5 6 7 8 9

  • 7/31/2019 Introduction on R

    24/95

    8/29/12

    Sequences

    > c(1.5:10,10)

    [1] 1.5 2.5 3.5 4.5

    5.5 6.5 7.5 8.5 9.510.0

  • 7/31/2019 Introduction on R

    25/95

    8/29/12

    Sequences

    The sequence function cancreate a string of values with

    any increment you wish. Youcan either specify theincremental value or the

    desired length of thesequence:

  • 7/31/2019 Introduction on R

    26/95

    8/29/12

    Sequences

    > seq(1,5) #same as 1:5

    [1] 1 2 3 4 5

    > seq(1,5,by=.5) #increment by 0.5

    [1] 1.0 1.5 2.0 2.5 3.0 3.5

  • 7/31/2019 Introduction on R

    27/95

    8/29/12

    Sequences

    > seq(1,6,by=.5)

    [1] 1.0 1.5 2.0 2.5 3.03.5 4.0 4.5 5.0 5.5 6.0

  • 7/31/2019 Introduction on R

    28/95

    8/29/12

    Sequences

    The replicate functioncan repeat a value or asequence of values aspecified number oftimes

  • 7/31/2019 Introduction on R

    29/95

    8/29/12

    Sequences

    How to repeat the value10 ten times?

    > rep(10,10)

    [1] 10 10 10 10 10 10 10

    10 10 10

  • 7/31/2019 Introduction on R

    30/95

    8/29/12

    Sequences

    How to repeat the stringA,B,C,D twice?

    >rep(c("A","B","C","D"),2)

    [1] "A" "B" "C" "D" "A""B" "C" "D"

  • 7/31/2019 Introduction on R

    31/95

    8/29/12

    Sequences

    How to make a 4x4 matrixof zeroes?

    > matrix(rep(0,16),nrow=4)

    [,1] [,2] [,3] [,4][1,] 0 0 0 0

    [2,] 0 0 0 0

  • 7/31/2019 Introduction on R

    32/95

    8/29/12

    Reading in Data: SingleVectors

    Data can be read directlyfrom encoded by using

    the scan() function. Sinceusing c() can sometimes

    be tiresome.

  • 7/31/2019 Introduction on R

    33/95

    8/29/12

    Reading in Data: SingleVectors

    Suppose that we count thenumber of passengers (not

    including the driver) in thenext 10 automobiles at anintersection:

    2 4 0 1 1 2 3 1 04

  • 7/31/2019 Introduction on R

    34/95

    8/29/12

    Reading in Data: SingleVectors

    > passengers

  • 7/31/2019 Introduction on R

    35/95

    8/29/12

    Reading in Data: SingleVectors

    How to print out thevalues of passengers?

    > passengers

    [1] 2 4 0 1 1 2 3 10 4

  • 7/31/2019 Introduction on R

    36/95

    8/29/12

    How to Create DataFrames?

    Individual variables aredesignated as columns of

    the data frame and haveunique names. However, all

    of the columns in a dataframe must be of the samelength.

  • 7/31/2019 Introduction on R

    37/95

    8/29/12

    How to Create DataFrames?

    Suppose that in the lastexperiment we also

    recorded the seatbelt use ofthe driver: Y = seatbelt

    worn, N = seatbelt notworn. Data:

    Y , N, Y, Y, Y, Y, Y, Y, Y

    C

  • 7/31/2019 Introduction on R

    38/95

    8/29/12

    How to Create DataFrames?

    Note: Since these data are textbased, we need to put quotesaround each data value.

    >seatbelt seatbelt

    [1] "Y" "N" "Y" "Y" "Y" "Y"

    H C D

  • 7/31/2019 Introduction on R

    39/95

    8/29/12

    How to Create DataFrames?

    How to combine thevariables passengers and

    seatbelts into a single dataframe ?

    > car.dat

  • 7/31/2019 Introduction on R

    40/95

    8/29/12

    >car.dat

    passengers seatbealt

    1 2 Y

    2 4 N

    3 0 Y4 1 Y

    5 1 Y

    6 2 Y

    7 3 Y

    8 1 Y

  • 7/31/2019 Introduction on R

    41/95

    8/29/12

    NOTE: when using dataframe all of thecolumns in a dataframe must be of thesame length.

    ANOTHER WAY OF

  • 7/31/2019 Introduction on R

    42/95

    8/29/12

    ANOTHER WAY OFENCODING DATA

    How to usespreadsheet?

    You can access the

    editor by using eitherthe edit() or fix()

    command ANOTHER WAY OF

  • 7/31/2019 Introduction on R

    43/95

    8/29/12

    ANOTHER WAY OFENCODING DATA

    > new.data new.data

  • 7/31/2019 Introduction on R

    44/95

    8/29/12

    Encode the educatorsdata. Use the followingvariables only

    job sex carsalary

  • 7/31/2019 Introduction on R

    45/95

    8/29/12

    > educators educators

  • 7/31/2019 Introduction on R

    46/95

    8/29/12

    I want to edit my data. Whatshould I do?

    > new.data educators

  • 7/31/2019 Introduction on R

    47/95

    8/29/12

    How to save file?

    Click the save

    icon.

    H t l d bj t i th

  • 7/31/2019 Introduction on R

    48/95

    8/29/12

    How to load object in thenext session?

    Click on the File, then LoadWorkspace.

    Click on the file you want to open.

    Then, type

    anyname

  • 7/31/2019 Introduction on R

    49/95

    8/29/12

    SUMMARIZI

    NG DATA

  • 7/31/2019 Introduction on R

    50/95

    8/29/12

    Numerical SummariesName Operation

    mean() arithmetic mean

    median() sample median

    fivenum() five-number summarysummary() generic summary function fordata and model fits

    min(), max() smallest/largest valuesquantile() calculate sample quantiles(percentiles)

    var(), sd() sample variance, sample

  • 7/31/2019 Introduction on R

    51/95

    8/29/12

    When using the data set use

    > attach(object)

    # add data with objectto search path

  • 7/31/2019 Introduction on R

    52/95

    8/29/12

    Numerical Summaries

    > mean (salary)[1] 62350.79

    > table(sex)

    sex0 1

    8 6

  • 7/31/2019 Introduction on R

    53/95

    8/29/12

    Numerical Summaries

    If there are missing valuesuse:

    mean(x,na.rm="true")

  • 7/31/2019 Introduction on R

    54/95

    8/29/12

    GRAPHS

  • 7/31/2019 Introduction on R

    55/95

    8/29/12

    Type

    >sex.freqsex.freq

    sex

    0 18 6

  • 7/31/2019 Introduction on R

    56/95

    8/29/12

    > cbind(sex.freq)

    >sex.freq

    0 81 6

  • 7/31/2019 Introduction on R

    57/95

    8/29/12

    >barplot(sex.freq)

  • 7/31/2019 Introduction on R

    58/95

    8/29/12

    >pie(sex.freq)

  • 7/31/2019 Introduction on R

    59/95

    8/29/12

    >attach(educators)

    >sexfreqbarplot(sexfreq)

  • 7/31/2019 Introduction on R

    60/95

    8/29/12

    >hist(car,right=FALSE)

  • 7/31/2019 Introduction on R

    61/95

    8/29/12

    >boxplot(salary,vertical=TR

    UE)

  • 7/31/2019 Introduction on R

    62/95

    8/29/12

    >boxplot(salary,horizontal=T

    RUE)

  • 7/31/2019 Introduction on R

    63/95

    8/29/12

    Statistical

    Inference

    How to access built in

  • 7/31/2019 Introduction on R

    64/95

    8/29/12

    How to access built indatasets?

    Type

    >data()

    Output

    How to access built in

  • 7/31/2019 Introduction on R

    65/95

    8/29/12

    How to access built indatasets?

    Is there a dataset withfilename trees?

    Use the dataset trees

    Type:

    > data(trees)

    How to access built in

  • 7/31/2019 Introduction on R

    66/95

    8/29/12

    How to access built indatasets?

    > treesGirth Height Volume

    1 8.3 70 10.3

    2 8.6 65 10.3

    3 8.8 63 10.24 10.5 72 16.4

    5 10.7 81 18.8

  • 7/31/2019 Introduction on R

    67/95

    8/29/12

    One sample ttest

    Using the trees dataset, testthe hypothesis that the mean

    black cherry tree heightis 70ft. versus a two-sidedalternative.

    > data(trees)

    > t.test(trees$Height,

    =

  • 7/31/2019 Introduction on R

    68/95

    8/29/12

    One sample ttest

    One Sample t-test

    data: Height

    t = 5.2429, df = 30, p-value = 1.173e-05

    alternative hypothesis: true mean is notequal to 70

    95 percent confidence interval:

    73.6628 78.3372

    sample estimates:

    Two-sample ttest

  • 7/31/2019 Introduction on R

    69/95

    8/29/12

    Two-sample ttest

    The recovery time (in days) is measured

    for 10 patients taking a new drug andfor 10 different patients taking aplacebo6. We wish to test the

    hypothesis that the mean recovery timefor patients taking the drug is less thanfort those taking a placebo (under anassumption of normality and equalpopulation variances). The data are:

    With drug: 15, 10, 13, 7, 9, 8, 21, 9,14, 8

  • 7/31/2019 Introduction on R

    70/95

    8/29/12

    Two-sample ttest

    > drug plac t.test(drug, plac,alternative = "less",var.equal = T)

  • 7/31/2019 Introduction on R

    71/95

    8/29/12

    Two-sample ttest

    Two Sample t-testdata: drug and plac

    t = -0.5331, df = 18, p-value = 0.3002

    alternative hypothesis: true difference inmeans is less than 0

    95 percent confidence interval:

    -Inf 2.027436

    sample estimates:

    mean of x mean of y

  • 7/31/2019 Introduction on R

    72/95

    8/29/12

    Two-sample ttest

  • 7/31/2019 Introduction on R

    73/95

    8/29/12

    Paired ttest

    An experiment was performed todetermine if a new gasoline additivecan increase the gas mileage of cars.

    In the experiment, six cars areselected and driven with and withoutthe additive. The gas mileages (in

    miles per gallon, mpg) are givenbelow.

    Car 1 2 3 4 5

    6

    i d

  • 7/31/2019 Introduction on R

    74/95

    8/29/12

    Paired ttest

    > add noadd t.test(add, noadd,paired=T, alt ="greater")

    Paired t test

  • 7/31/2019 Introduction on R

    75/95

    8/29/12

    Paired t-test

    data: add and noaddt = 3.9994, df = 5, p-value =0.005165

    alternative hypothesis: truedifference in means is greater than0

    95 percent confidence interval:0.3721225 Infsample estimates:

    mean of the differences

    ANOVA

  • 7/31/2019 Introduction on R

    76/95

    8/29/12

    ANOVA

    > aov(x ~ a) # one-wayANOVA model

    > aov(x ~ a + b) # two-wayANOVA with no interaction

    >aov(x ~ a + b + a:b) # two-

    way ANOVA withinteraction

    > aov(x ~ a*b) # exactly the

    ANOVA

  • 7/31/2019 Introduction on R

    77/95

    8/29/12

    ANOVA

    The strength of three different rubbercompounds; four specimens of eachtype were tested for their tensile

    strength (measured in pounds persquare inch):

    ANOVA

  • 7/31/2019 Introduction on R

    78/95

    8/29/12

    ANOVA> str type type type

    [1] A A A A B B B B C C C C

    ANOVA

  • 7/31/2019 Introduction on R

    79/95

    8/29/12

    ANOVA

    To calculate the sample meansof the subgroups, type

    > tapply(str,type,mean) A B C

    3213.75 3330.00 3552.50

    ANOVA

  • 7/31/2019 Introduction on R

    80/95

    8/29/12

    ANOVA

    To calculate the variances:

    > tapply(str,type,var)

    A B C

    6172.917 6733.333

    2541.667

    ANOVA

  • 7/31/2019 Introduction on R

    81/95

    8/29/12

    ANOVA

    >anova.fit

  • 7/31/2019 Introduction on R

    82/95

    8/29/12

    ANOVA

    To extract the ANOVAtable, use the Rfunction summary():

    >summary(anova.fit)

    ANOVA

  • 7/31/2019 Introduction on R

    83/95

    8/29/12

    ANOVA

    Df Sum Sq Mean Sq F value Pr(>F)

    type 2 237029 118515 23.020.000289 ***

    Residuals 9 46344 5149

    ---

    Signif. codes: 0 *** 0.001 ** 0.01 *0.05 . 0.1 1

    M lti l i t t

  • 7/31/2019 Introduction on R

    84/95

    8/29/12

    Multiple comparison test

    >TukeyHSD(anova.

    fit)

    Tukey multiple comparisons of

  • 7/31/2019 Introduction on R

    85/95

    8/29/12

    y p pmeans

    95% family-wise confidence level

    Fit: aov(formula = str ~ type)

    $type

    diff lwr upr p adj

    B-A 116.25 -25.41926 257.9193

    0 1085202

    LINEAR REGRESSION

  • 7/31/2019 Introduction on R

    86/95

    8/29/12

    LINEAR REGRESSION

    > lm(y ~ x) # simple linearregression (SLR) model

    > lm(y ~ x1 + x2) # a regression

    plane

    > lm(y ~ x1 + x2 + x3) # linearmodel with three regressors

    > lm(y ~ x 1) # SLR w/ anintercept of zero

    > lm(y ~ x + I(x^2)) # quadratic

    LINEAR REGRESSION

  • 7/31/2019 Introduction on R

    87/95

    8/29/12

    LINEAR REGRESSION

    Consider the cars dataset.The data give the speed

    (speed) of cars and thedistances (dist) taken tocome to a complete stop. Fita linear regression modelusing speed as the

    independent variable and

  • 7/31/2019 Introduction on R

    88/95

    8/29/12

    names(car

    s)

    LINEAR REGRESSION

  • 7/31/2019 Introduction on R

    89/95

    8/29/12

    LINEAR REGRESSION

    >fit fit

  • 7/31/2019 Introduction on R

    90/95

    8/29/12

    Call:

    lm(formula = dist ~ speed)

    Coefficients:

    (Intercept) speed

    -17.579 3.932

    LINEAR REGRESSION

  • 7/31/2019 Introduction on R

    91/95

    8/29/12

    LINEAR REGRESSION

    >

    summary(fit

    )

    Call:

  • 7/31/2019 Introduction on R

    92/95

    8/29/12

    lm(formula = dist ~ speed)

    Residuals:

    Min 1Q Median 3Q Max

    -29.069 -9.525 -2.272 9.215 43.201

    Coefficients:

    Estimate Std. Error t value Pr(>|t|)

    (Intercept) -17.5791 6.7584 -2.601 0.0123 *

    speed 3.9324 0.4155 9.464 1.49e-12 ***

    ---

    Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

  • 7/31/2019 Introduction on R

    93/95

    8/29/12

    > anova(fit)

    ow o re r eveEMPLOYEEALL data from txt

  • 7/31/2019 Introduction on R

    94/95

    8/29/12

    EMPLOYEEALL data from txt

    file?>emp

  • 7/31/2019 Introduction on R

    95/95

    How to rename a variable?

    > names(emp)