r by samyojit

Upload: soumyajit-das

Post on 03-Apr-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 R by samyojit

    1/41

    Introduction To R

    by SAM YO JIT

    Try R Chapter 1

    R

    In this first chapter, we'll cover basic R expressions. We'll start simple, with numbers,

    strings, and true/false values. Then we'll show you how to store those values in variables,and how to pass them to functions. We'll show you how to get help on functions when

    you're stuck. Finally we'll load an R script in from a file.

    Let's get started!

    Continue

    1.Expressions 1.1+75 points

    Type anything at the prompt, and R will evaluate it and print the answer.

    Let's try some simple math. Type the below command.

    [Or, if you prefer, click on the command and it will be typed into the console for you!]

    1 + 1

    RedoComplete

    > 1 + 1[1] 2

    http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1
  • 7/28/2019 R by samyojit

    2/41

    There's your result, 2. It's printed on the console right after your entry.

    2. +75 pointsType the string "Arr, matey!". (Don't forget the quotes!)

    "Arr, matey!"

    RedoComplete

    > "Arr, matey!"[1] "Arr, matey!"

    3. +250 pointsNow try multiplying 6 times 7 (* is the multiplication operator).

    RedoComplete

    > 6 * 7[1] 42

    4.Logical Values 1.2+75 points

    Some expressions return a "logical value": TRUE or FALSE. (Many programminglanguages refer to these as "boolean" values.) Let's try typing an expression that gives us

    a logical value:

    3 < 4

    RedoComplete

    > 3 < 4[1] TRUE

    5. +75 pointsAnd another logical value (note that you need a double-equals sign to check whether twovalues are equal - a single-equals sign won't work):

    2 + 2 == 5

    RedoComplete

    > 2 + 2 == 5[1] FALSE

    http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1
  • 7/28/2019 R by samyojit

    3/41

    6. +250 pointsT and F are shorthand for TRUE and FALSE. Try this:

    T == TRUE

    RedoComplete

    > T == TRUE[1] TRUE

    7.Variables 1.3+75 points

    As in other programming languages, you can store values into a variable to access it later.

    Type x

  • 7/28/2019 R by samyojit

    4/41

    You can print the value of a variable at any time just by typing its name in the console.

    Try printing the current value ofx.

    x

    RedoComplete

    > x[1] "Arr, matey!"

    11.+75 pointsNow try assigning the TRUE logical value to x.

    RedoComplete

    > x

  • 7/28/2019 R by samyojit

    5/41

    RedoComplete

    > sqrt(16)[1] 4

    15. Help 1.5+75 points

    help(functionname) brings up help for the given function. Try displaying help for thesum function:

    help(sum)

    RedoComplete

    > help(sum)sum package:base R Documentation

    Sum of Vector Elements

    Description:

    'sum' returns the sum of all the values present in its arguments.

    Usage:

    sum(..., na.rm = FALSE)...

    (Don't worry about that optional na.rm argument, we'll cover that later.)

    16.+75 pointsexample(functionname) brings up examples of usage for the given function. Try

    displaying examples for the min function:

    example(min)

    RedoComplete

    > example(min)

    min> require(stats); require(graphics)

    min> min(5:1, pi) #-> one number[1] 1

    min> pmin(5:1, pi) #-> 5 numbers[1] 3.141593 3.141593 3.000000 2.000000 1.000000

    http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1
  • 7/28/2019 R by samyojit

    6/41

    ...

    17.+75 pointsNow try bringing up help for the rep function:

    RedoComplete

    > help(rep)rep package:base R Documentation

    Replicate Elements of Vectors and Lists

    Description:

    'rep' replicates the values in 'x'. It is a generic function, andthe (internal) default method is described here.

    ...

    18. Files 1.6+75 points

    Typing commands each time you need them only works for short scripts, of course. Rcommands can also be written in plain text files (with a ".R" extension, by convention)

    for executing later. You can run them directly from the command line, or from within arunning R instance.

    We've stored a couple sample scripts for you. You can list the files in the current

    directory from within R, by calling the list.files function. Try it now:

    list.files()

    RedoComplete

    > list.files()[1] "bottle1.R" "bottle2.R"

    19.+75 pointsTo run a script, pass a string with its name to the source function. Try running the"bottle1.R" script:

    source("bottle1.R")

    RedoComplete

    > source("bottle1.R")[1] "This be a message in a bottle1.R!"

    http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1
  • 7/28/2019 R by samyojit

    7/41

    20.+75 pointsNow try running "bottle2.R":

    RedoComplete

    > source("bottle2.R")[1] "Will ye be me pen pal?"

    1. Try R Chapter 2Vectors

    The name may sound intimidating, but a vector is simply a list of values. R relies on

    vectors for many of its operations. This includes basic plots - we'll have you drawinggraphs by the end of this chapter (and it's a lot easier than you might think)!

    Course tip: if you haven't already, try clicking on the expand icon ( ) in the upper-leftcorner of the sidebar. The expanded sidebar offers a more in-depth look at chapter

    sections and progress.

    Continue

    2.Vectors 2.1+75 points

    A vector's values can be numbers, strings, logical values, or any other type, as long as

    they're all thesame type. Try creating a vector of numbers, like this:

    c(4, 7, 9)

    RedoComplete

    > c(4, 7, 9)

    http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/1/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/1/challenges/1
  • 7/28/2019 R by samyojit

    8/41

    [1] 4 7 9

    The c function (c is short for Combine) creates a new vector by combining a list ofvalues.

    3. +75 pointsNow try creating a vector with strings:

    c('a', 'b', 'c')

    RedoComplete

    > c('a', 'b', 'c')[1] "a" "b" "c"

    4. +75 pointsVectors cannot hold values with different modes (types). Try mixing modes and see whathappens:

    c(1, TRUE, "three")

    RedoComplete

    > c(1, TRUE, "three")[1] "1" "TRUE" "three"

    All the values were converted to a single mode (characters) so that the vector can hold

    them all.

    5.Sequence Vectors 2.2+75 points

    If you need a vector with a sequence of numbers you can create it with start:endnotation. Let's make a vector with values from 5 through 9:

    5:9

    RedoComplete

    > 5:9[1] 5 6 7 8 9

    6. +75 points

    http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1
  • 7/28/2019 R by samyojit

    9/41

    A more versatile way to make sequences is to call the seq function. Let's do the same

    thing with seq:

    seq(5, 9)

    RedoComplete

    > seq(5, 9)[1] 5 6 7 8 9

    7. +75 pointsseq also allows you to use increments other than 1. Try it with steps of 0.5:

    seq(5, 9, 0.5)

    RedoComplete

    > seq(5, 9, 0.5)[1] 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0

    8. +75 pointsNow try making a vector with integers from 9 down to 5:

    9:5

    RedoComplete

    > 9:5[1] 9 8 7 6 5

    9.Vector Access 2.3+75 points

    We're going to create a vector with some strings in it for you, and store it in the sentence

    variable.

    You can retrieve an individual value within a vector by providing its numeric index insquare brackets. Try getting the third value:

    sentence[3]

    RedoComplete

    > sentence sentence[3]

    http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1
  • 7/28/2019 R by samyojit

    10/41

    [1] "plank"

    10.+75 pointsMany languages start array indices at 0, but R's vector indices start at 1. Get the first

    value by typing:

    sentence[1]

    RedoComplete

    > sentence[1][1] "walk"

    11.+75 pointsYou can assign new values within an existing vector. Try changing the third word to

    "dog":

    sentence[3] sentence[3]

  • 7/28/2019 R by samyojit

    11/41

    14.+75 pointsThis means you can retrieve ranges of values. Get the second through fourth words:

    sentence[2:4]

    RedoComplete

    > sentence[2:4][1] "the" "dog" "to"

    15.+75 pointsYou can also set ranges of values; just provide the values in a vector. Add words 5through 7:

    sentence[5:7] sentence[5:7] sentence[6][1] "poop"

    17. Vector Names 2.4+75 points

    For this challenge, we'll make a 3-item vector for you, and store it in the ranks variable.

    You can assign names to a vector's elements by passing a second vector filled with names

    to the names assignment function, like this:

    names(ranks) ranks names(ranks)

  • 7/28/2019 R by samyojit

    12/41

    18.+75 pointsAssigning names for a vector can act as useful labels for the data. Below, you can seewhat our vector looks like now.

    You can also use the names to access the vector's values. Try getting the value for the"first" rank:

    ranks["first"]

    RedoComplete

    > ranksfirst second third

    1 2 3> ranks["first"]first

    1

    19.+75 pointsNow see if you can set the value for the "third" rank to something other than 3 using thename rather than the position.

    RedoComplete

    > ranks["third"] vesselsSunk barplot(vesselsSunk)

    21.+75 pointsIf you assign names to the vector's values, R will use those names as labels on the bar

    plot. Let's add names:

    http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1
  • 7/28/2019 R by samyojit

    13/41

    names(vesselsSunk) names(vesselsSunk) a a + 1[1] 2 3 4

    25.+75 pointsThe same is true of division, multiplication, or any other basic arithmetic. Try dividing

    our vector by 2:

    http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1
  • 7/28/2019 R by samyojit

    14/41

    a / 2

    RedoComplete

    > a / 2[1] 0.5 1.0 1.5

    26.+75 pointsNow try multiplying our vector by 2:

    RedoComplete

    > a * 2[1] 2 4 6

    27.+75 pointsIf you add two vectors, R will take each value from each vector and add them. We'll

    make a second vector for you to experiment with, and store it in the b variable.

    Try adding it to the a vector:

    a + b

    RedoComplete

    > b a + b

    [1] 5 7 9

    28.+75 pointsNow try subtracting b from a:

    RedoComplete

    > a - b[1] -3 -3 -3

    29.+75 pointsYou can also take two vectors and compare each item. See which values in the a vector

    are equal to those in a second vector:

    a == c(1, 99, 3)

    RedoComplete

    http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1
  • 7/28/2019 R by samyojit

    15/41

    > a == c(1, 99, 3)[1] TRUE FALSE TRUE

    Notice that R didn't test whether the whole vectors were equal; it checked each value inthe a vector against the value at the same index in our new vector.

    30.+75 pointsCheck if each value in the a vector is less than the corresponding value in another vector:

    RedoComplete

    > a < c(1, 99, 3)[1] FALSE TRUE FALSE

    31.+75 pointsFunctions that normally work with scalars can operate on each element of a vector, too.Try getting the sine of each value in our vector:

    sin(a)

    RedoComplete

    > sin(a)[1] 0.8414710 0.9092974 0.1411200

    32.+75 pointsNow try getting the square roots with sqrt:

    RedoComplete

    > sqrt(a)[1] 1.000000 1.414214 1.732051

    33. Scatter Plots 2.7+75 points

    The plot function takes two vectors, one for X values and one for Y values, and draws agraph of them.

    Let's draw a graph showing the relationship of numbers and their sines.

    First, we'll need some sample data. We'll create a vector for you with some fractional

    values between 0 and 20, and store it in the x variable.

    http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1
  • 7/28/2019 R by samyojit

    16/41

    Now, try creating a second vector with the sines of those values:

    y x y plot(x, y)

    Great job! Notice on the graph that values from the first argument (x) are used for the

    horizontal axis, and values from the second (y) for the vertical.

    35.+75 pointsYour turn. We'll create a vector with some negative and positive values for you, and store

    it in the values variable.

    We'll also create a second vector with the absolute values of the first, and store it in the

    absolutes variable.

    Try plotting the vectors, with values on the horizontal axis, and absolutes on thevertical axis.

    RedoComplete

    > values absolutes plot(values, absolutes)

    36. NA Values 2.8+75 points

    Sometimes, when working with sample data, a given value isn't available. But it's not a

    good idea to just throw those values out. R has a value that explicitly indicates a sample

    was not available: NA. Many functions that work with vectors treat this value specially.

    http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1
  • 7/28/2019 R by samyojit

    17/41

    We'll create a vector for you with a missing sample, and store it in the a variable.

    Try to get the sum of its values, and see what the result is:

    sum(a)

    RedoComplete

    > a sum(a)[1] NA

    The sum is considered "not available" by default because one of the vector's values was

    NA. This is the responsible thing to do; R won't just blithely add up the numbers without

    warning you about the incomplete data. We can explicitly tell sum (and many other

    functions) to remove NA values before they do their calculations, however.

    37.+75 pointsRemember that command to bring up help for a function? Bring up documentation for the

    sum function:

    RedoComplete

    > help(sum)sum package:base R Documentation

    Sum of Vector Elements

    Description:'sum' returns the sum of all the values present in its arguments.

    Usage:sum(..., na.rm = FALSE)

    ...

    As you see in the documentation, sum can take an optional named argument, na.rm. It's

    set to FALSE by default, but if you set it to TRUE, all NA arguments will be removed fromthe vector before the calculation is performed.

    38.+75 pointsTry calling sum again, with na.rm set to TRUE:

    sum(a, na.rm = TRUE)

    RedoComplete

    > sum(a, na.rm = TRUE)

    http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1http://tryr.codeschool.com/levels/2/challenges/1
  • 7/28/2019 R by samyojit

    18/41

    [1] 20

    1. Try R Chapter 3Matrices

    So far we've only worked with vectors, which are simple lists of values. What if you need

    data in rows and columns? Matrices are here to help.

    A matrix is just a fancy term for a 2-dimensional array. In this chapter, we'll show you all

    the basics of working with matrices, from creating them, to accessing them, to plotting

    them.

    Continue

    2.Matrices 3.1+75 points

    Let's make a matrix 3 rows high by 4 columns wide, with all its fields set to 0.

    matrix(0, 3, 4)

    RedoComplete

    > matrix(0, 3, 4)[,1] [,2] [,3] [,4]

    [1,] 0 0 0 0[2,] 0 0 0 0[3,] 0 0 0 0

    3. +75 pointsYou can also use a vector to initialize a matrix's value. To fill a 3x4 matrix, you'll need a

    12-item vector. We'll make that for you now:

    a

  • 7/28/2019 R by samyojit

    19/41

    RedoComplete

    > a print(a)[1] 1 2 3 4 5 6 7 8 9 10 11 12

    5. +75 pointsNow call matrix with the vector, the number of rows, and the number of columns:

    matrix(a, 3, 4)

    RedoComplete

    > matrix(a, 3, 4)[,1] [,2] [,3] [,4]

    [1,] 1 4 7 10[2,] 2 5 8 11[3,] 3 6 9 12

    6. +75 pointsThe vector's values are copied into the new matrix, one by one. You can also re-shape the

    vector itself into a matrix. We'll create a new 8-item vector for you:

    plank plank

  • 7/28/2019 R by samyojit

    20/41

    dim(plank) dim(plank)

  • 7/28/2019 R by samyojit

    21/41

    RedoComplete

    > print(plank)[,1] [,2] [,3] [,4]

    [1,] 1 3 5 7[2,] 2 4 6 8

    11.+75 pointsTry getting the value from the second row in the third column ofplank:

    plank[2, 3]

    RedoComplete

    > plank[2, 3][1] 6

    12.+75 pointsNow, try getting the value from first row of the fourth column:

    RedoComplete

    > plank[1, 4][1] 7

    13.+75 pointsAs with vectors, to set a single value, just assign to it. Set the previous value to 0:

    plank[1, 4] plank[1, 4] plank[2,][1] 2 4 0 8

    http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1
  • 7/28/2019 R by samyojit

    22/41

    15.+75 pointsTo get an entire column, omit the row index. Retrieve the fourth column:

    plank[, 4]

    RedoComplete

    > plank[, 4][1] 7 8

    16.+75 pointsYou can read multiple rows or columns by providing a vector or sequence with theirindices. Try retrieving columns 2 through 4:

    plank[, 2:4]

    RedoComplete

    > plank[, 2:4][,1] [,2] [,3]

    [1,] 3 5 7[2,] 4 0 8

    17. Matrix Plotting 3.3+75 points

    Text output is only useful when matrices are small. When working with more complex

    data, you'll need something better. Fortunately, R includes powerful visualizations formatrix data.

    We'll start simple, with an elevation map of a sandy beach.

    It's pretty flat - everything is 1 meter above sea level. We'll create a 10 by 10 matrix with

    all its values initialized to 1 for you:

    elevation elevation

  • 7/28/2019 R by samyojit

    23/41

    Oh, wait, we forgot the spot where we dug down to sea level to retrieve a treasure chest.

    At the fourth row, sixth column, set the elevation to 0:

    elevation[4, 6] elevation[4, 6] contour(elevation)

    20.+75 pointsOr you can create a 3D perspective plot with the persp function:

    persp(elevation)

    RedoComplete

    > persp(elevation)

    21.+75 pointsThe perspective plot looks a little odd, though. This is because persp automatically

    expands the view so that your highest value (the beach surface) is at the very top.

    We can fix that by specifying our own value for the expand parameter.

    persp(elevation, expand=0.2)

    RedoComplete

    > persp(elevation, expand=0.2)

    22.+75 points

    http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1
  • 7/28/2019 R by samyojit

    24/41

    Okay, those examples are a little simplistic. Thankfully, R includes some sample data sets

    to play around with. One of these is volcano, a 3D map of a dormant New Zealandvolcano.

    It's simply an 87x61 matrix with elevation values, but it shows the power of R's matrix

    visualizations.

    Try creating a contour map of the volcano matrix:

    contour(volcano)

    RedoComplete

    > contour(volcano)

    23.+75 pointsTry a perspective plot (limit the vertical expansion to one-fifth again):

    persp(volcano, expand=0.2)

    RedoComplete

    > persp(volcano, expand=0.2)

    24.+75 pointsThe image function will create a heat map:

    image(volcano)

    RedoComplete

    > image(volcano)

    1. Try R Chapter 4

    http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1http://tryr.codeschool.com/levels/3/challenges/1
  • 7/28/2019 R by samyojit

    25/41

    Summary Statistics

    Simply throwing a bunch of numbers at your audience will only confuse them. Part of a

    statistician's job is to explain their data. In this chapter, we'll show you some of the tools

    R offers to let you do so, with minimum fuss.

    Continue

    2.Mean 4.1+75 points

    Determining the health of the crew is an important part of any inventory of the ship.

    Here's a vector containing the number of limbs each member has left, along with theirnames.

    limbs barplot(limbs)

    4. +75 points

    http://tryr.codeschool.com/levels/4/challenges/1http://tryr.codeschool.com/levels/4/challenges/1http://tryr.codeschool.com/levels/4/challenges/1http://tryr.codeschool.com/levels/4/challenges/1http://tryr.codeschool.com/levels/4/challenges/1http://tryr.codeschool.com/levels/4/challenges/1http://tryr.codeschool.com/levels/4/challenges/1http://tryr.codeschool.com/levels/4/challenges/1http://tryr.codeschool.com/levels/4/challenges/1http://tryr.codeschool.com/levels/4/challenges/1http://tryr.codeschool.com/levels/4/challenges/1http://tryr.codeschool.com/levels/4/challenges/1http://tryr.codeschool.com/levels/4/challenges/1http://tryr.codeschool.com/levels/4/challenges/1http://tryr.codeschool.com/levels/4/challenges/1
  • 7/28/2019 R by samyojit

    26/41

    If we draw a line on the plot representing the mean, we can easily compare the various

    values to the average. The abline function can take an h parameter with a value at which

    to draw a horizontal line, or a v parameter for a vertical line. When it's called, it updates

    the previous plot.

    Draw a horizontal line across the plot at the mean:

    abline(h = mean(limbs))

    RedoComplete

    > abline(h = mean(limbs))

    5.Median 4.2+75 points

    Let's say we gain a crew member that completely skews the mean.

    > limbs names(limbs) mean(limbs)[1] 4.75

    Let's see how this new mean shows up on our same graph.

    abline(h = mean(limbs))

    RedoComplete

    > barplot(limbs)> abline(h = mean(limbs))

    It may be factually accurate to say that our crew has an average of 4.75 limbs, but it's

    probably also misleading.

    6. +75 pointsFor situations like this, it's probably more useful to talk about the "median" value. Themedian is calculated by sorting the values and choosing the middle one - the third value,

    in this case. (For sets with an even number of values, the middle two values are

    averaged.)

    Call the median function on the vector:

    median(limbs)

    http://tryr.codeschool.com/levels/4/challenges/1http://tryr.codeschool.com/levels/4/challenges/1http://tryr.codeschool.com/levels/4/challenges/1http://tryr.codeschool.com/levels/4/challenges/1http://tryr.codeschool.com/levels/4/challenges/1http://tryr.codeschool.com/levels/4/challenges/1http://tryr.codeschool.com/levels/4/challenges/1http://tryr.codeschool.com/levels/4/challenges/1http://tryr.codeschool.com/levels/4/challenges/1http://tryr.codeschool.com/levels/4/challenges/1http://tryr.codeschool.com/levels/4/challenges/1http://tryr.codeschool.com/levels/4/challenges/1http://tryr.codeschool.com/levels/4/challenges/1http://tryr.codeschool.com/levels/4/challenges/1http://tryr.codeschool.com/levels/4/challenges/1
  • 7/28/2019 R by samyojit

    27/41

    RedoComplete

    > median(limbs)[1] 4

    7. +75 pointsThat's more like it. Let's show the median on the plot. Draw a horizontal line across the

    plot at the median.

    abline(h = median(limbs))

    RedoComplete

    > abline(h = median(limbs))

    8.Standard Deviation 4.3+75 points

    Some of the plunder from our recent raids has been worth less than what we're used to.Here's a vector with the values of our latest hauls:

    > pounds barplot(pounds)> meanValue

  • 7/28/2019 R by samyojit

    28/41

    If that sounds like a lot of work, don't worry. You're using R, and all you have to do is

    pass a vector to the sd function. Try calling sd on the pounds vector now, and assign the

    result to the deviation variable:

    deviation deviation deviation[1] 14500.62

    10.+75 pointsWe'll add a line on the plot to show one standard deviation above the mean (the top of the

    normal range)...

    abline(h = meanValue + deviation)

    RedoComplete

    > abline(h = meanValue + deviation)

    Hail to the sailor that brought us that 50,000-pound payday!

    11.+75 pointsNow try adding a line on the plot to show one standard devation below the mean (the

    bottom of the normal range):

    RedoComplete

    > abline(h = meanValue - deviation)

    We're risking being hanged by the Spanish for this? Sorry, Smitty, you're shark bait.

    http://tryr.codeschool.com/levels/4/challenges/1http://tryr.codeschool.com/levels/4/challenges/1http://tryr.codeschool.com/levels/4/challenges/1http://tryr.codeschool.com/levels/4/challenges/1http://tryr.codeschool.com/levels/4/challenges/1http://tryr.codeschool.com/levels/4/challenges/1http://tryr.codeschool.com/levels/4/challenges/1http://tryr.codeschool.com/levels/4/challenges/1http://tryr.codeschool.com/levels/4/challenges/1http://tryr.codeschool.com/levels/4/challenges/1http://tryr.codeschool.com/levels/4/challenges/1http://tryr.codeschool.com/levels/4/challenges/1http://tryr.codeschool.com/levels/4/challenges/1http://tryr.codeschool.com/levels/4/challenges/1http://tryr.codeschool.com/levels/4/challenges/1
  • 7/28/2019 R by samyojit

    29/41

    1. Try R Chapter 5

    FactorsOften your data needs to be grouped by category: blood pressure by age range, accidents

    by auto manufacturer, and so forth. R has a special collection type called afactorto trackthese categorized values.

    Continue

    2.Creating Factors 5.1+75 points

    It's time to take inventory of the ship's hold. We'll make a vector for you with the type of

    booty in each chest.

    To categorize the values, simply pass the vector to the factor function:

    types chests types print(chests)[1] "gold" "silver" "gems" "gold" "gems"

    4. +75 pointsYou see the raw list of strings, repeated values and all. Now print the types factor:

    print(types)

    http://tryr.codeschool.com/levels/5/challenges/1http://tryr.codeschool.com/levels/5/challenges/1http://tryr.codeschool.com/levels/5/challenges/1http://tryr.codeschool.com/levels/5/challenges/1http://tryr.codeschool.com/levels/5/challenges/1http://tryr.codeschool.com/levels/5/challenges/1http://tryr.codeschool.com/levels/5/challenges/1http://tryr.codeschool.com/levels/5/challenges/1http://tryr.codeschool.com/levels/5/challenges/1http://tryr.codeschool.com/levels/5/challenges/1http://tryr.codeschool.com/levels/5/challenges/1http://tryr.codeschool.com/levels/5/challenges/1http://tryr.codeschool.com/levels/5/challenges/1http://tryr.codeschool.com/levels/5/challenges/1http://tryr.codeschool.com/levels/5/challenges/1http://tryr.codeschool.com/levels/5/challenges/1http://tryr.codeschool.com/levels/5/challenges/1http://tryr.codeschool.com/levels/5/challenges/1
  • 7/28/2019 R by samyojit

    30/41

    RedoComplete

    > print(types)[1] gold silver gems gold gemsLevels: gems gold silver

    Printed at the bottom, you'll see the factor's "levels" - groups of unique values. Noticealso that there are no quotes around the values. That's because they're not strings; they're

    actually integer references to one of the factor's levels.

    5. +75 pointsLet's take a look at the underlying integers. Pass the factor to the as.integer function:

    as.integer(types)

    RedoComplete

    > as.integer(types)[1] 2 3 1 2 1

    6. +75 pointsYou can get only the factor levels with the levels function:

    levels(types)

    RedoComplete

    > levels(types)[1] "gems" "gold" "silver"

    7.Plots With Factors 5.2+75 points

    You can use a factor to separate plots into categories. Let's graph our five chests by

    weight and value, and show their type as well. We'll create two vectors for you; weights

    will contain the weight of each chest, and prices will track how much the chests are

    worth.

    Now, try calling plot to graph the chests by weight and value.

    plot(weights, prices)

    RedoComplete

    > weights

  • 7/28/2019 R by samyojit

    31/41

    > prices plot(weights, prices)

    8. +75 pointsWe can't tell which chest is which, though. Fortunately, we can use different plot

    characters for each type by converting the factor to integers, and passing it to the pch

    argument ofplot.

    plot(weights, prices, pch=as.integer(types))

    RedoComplete

    > plot(weights, prices, pch=as.integer(types))

    "Circle", "Triangle", and "Plus Sign" still aren't great descriptions for treasure, though.Let's add a legend to show what the symbols mean.

    9. +75 pointsThe legend function takes a location to draw in, a vector with label names, and a vectorwith numeric plot character IDs.

    legend("topright", c("gems", "gold", "silver"), pch=1:3)

    RedoComplete

    > legend("topright", c("gems", "gold", "silver"), pch=1:3)

    Next time the boat's taking on water, it would be wise to dump the silver and keep thegems!

    10.+75 pointsIf you hard-code the labels and plot characters, you'll have to update them every time you

    change the plot factor. Instead, it's better to derive them by using the levels function onyour factor:

    legend("topright", levels(types), pch=1:length(levels(types)))

    RedoComplete

    > legend("topright", levels(types), pch=1:length(levels(types)))

    http://tryr.codeschool.com/levels/5/challenges/1http://tryr.codeschool.com/levels/5/challenges/1http://tryr.codeschool.com/levels/5/challenges/1http://tryr.codeschool.com/levels/5/challenges/1http://tryr.codeschool.com/levels/5/challenges/1http://tryr.codeschool.com/levels/5/challenges/1http://tryr.codeschool.com/levels/5/challenges/1http://tryr.codeschool.com/levels/5/challenges/1http://tryr.codeschool.com/levels/5/challenges/1http://tryr.codeschool.com/levels/5/challenges/1http://tryr.codeschool.com/levels/5/challenges/1http://tryr.codeschool.com/levels/5/challenges/1http://tryr.codeschool.com/levels/5/challenges/1http://tryr.codeschool.com/levels/5/challenges/1http://tryr.codeschool.com/levels/5/challenges/1http://tryr.codeschool.com/levels/5/challenges/1http://tryr.codeschool.com/levels/5/challenges/1http://tryr.codeschool.com/levels/5/challenges/1
  • 7/28/2019 R by samyojit

    32/41

    1. Try R Chapter 6Data Frames

    The weights, prices, and types data structures are all deeply tied together, if you thinkabout it. If you add a new weight sample, you need to remember to add a new price andtype, or risk everything falling out of sync. To avoid trouble, it would be nice if we could

    tie all these variables together in a single data structure.

    Fortunately, R has a structure for just this purpose: the data frame. You can think of a

    data frame as something akin to a database table or an Excel spreadsheet. It has a specific

    number of columns, each of which is expected to contain values of a particular type. Italso has an indeterminate number of rows - sets of related values for each column.

    Continue

    2.Data Frames 6.1+75 points

    Our vectors with treasure chest data are perfect candidates for conversion to a data frame.

    And it's easy to do. Call the data.frame function, and pass weights, prices, and types

    as the arguments. Assign the result to the treasure variable:

    treasure treasure

  • 7/28/2019 R by samyojit

    33/41

    > print(treasure)weights prices types

    1 300 9000 gold2 200 5000 silver3 100 12000 gems4 250 7500 gold5 150 18000 gems

    There's your new data frame, neatly organized into rows, with column names (derivedfrom the variable names) across the top.

    4.Data Frame Access 6.2+75 points

    Just like matrices, it's easy to access individual portions of a data frame.

    You can get individual columns by providing their index number in double-brackets. Trygetting the second column (prices) of treasure:

    treasure[[2]]

    RedoComplete

    > treasure[[2]][1] 9000 5000 12000 7500 18000

    5. +75 pointsYou could instead provide a column name as a string in double-brackets. (This is often

    more readable.) Retrieve the "weights" column:

    treasure[["weights"]]

    RedoComplete

    > treasure[["weights"]][1] 300 200 100 250 150

    6. +75 pointsTyping all those brackets can get tedious, so there's also a shorthand notation: the data

    frame name, a dollar sign, and the column name (without quotes). Try using it to get the

    "prices" column:

    treasure$prices

    RedoComplete

    http://tryr.codeschool.com/levels/6/challenges/1http://tryr.codeschool.com/levels/6/challenges/1http://tryr.codeschool.com/levels/6/challenges/1http://tryr.codeschool.com/levels/6/challenges/1http://tryr.codeschool.com/levels/6/challenges/1http://tryr.codeschool.com/levels/6/challenges/1http://tryr.codeschool.com/levels/6/challenges/1http://tryr.codeschool.com/levels/6/challenges/1http://tryr.codeschool.com/levels/6/challenges/1http://tryr.codeschool.com/levels/6/challenges/1http://tryr.codeschool.com/levels/6/challenges/1http://tryr.codeschool.com/levels/6/challenges/1http://tryr.codeschool.com/levels/6/challenges/1http://tryr.codeschool.com/levels/6/challenges/1http://tryr.codeschool.com/levels/6/challenges/1http://tryr.codeschool.com/levels/6/challenges/1http://tryr.codeschool.com/levels/6/challenges/1http://tryr.codeschool.com/levels/6/challenges/1
  • 7/28/2019 R by samyojit

    34/41

    > treasure$prices[1] 9000 5000 12000 7500 18000

    7. +75 pointsNow try getting the "types" column:

    RedoComplete

    > treasure[["types"]][1] gold silver gems gold gemsLevels: gems gold silver

    8.Loading Data Frames 6.3+75 points

    Typing in all your data by hand only works up to a point, obviously, which is why R wasgiven the capability to easily load data in from external files.

    We've created a couple data files for you to experiment with:

    > list.files()[1] "targets.csv" "infantry.txt"

    Our "targets.csv" file is in the CSV (Comma Separated Values) format exported by many

    popular spreadsheet programs. Here's what its content looks like:

    "Port","Population","Worth""Cartagena",35000,10000"Porto Bello",49000,15000"Havana",140000,50000"Panama City",105000,35000

    You can load a CSV file's content into a data frame by passing the file name to the

    read.csv function. Try it with the "targets.csv" file:

    read.csv("targets.csv")

    RedoComplete

    > read.csv("targets.csv")Port Population Worth1 Cartagena 35000 100002 Porto Bello 49000 150003 Havana 140000 500004 Panama City 105000 35000

    9. +75 points

    http://tryr.codeschool.com/levels/6/challenges/1http://tryr.codeschool.com/levels/6/challenges/1http://tryr.codeschool.com/levels/6/challenges/1http://tryr.codeschool.com/levels/6/challenges/1http://tryr.codeschool.com/levels/6/challenges/1http://tryr.codeschool.com/levels/6/challenges/1http://tryr.codeschool.com/levels/6/challenges/1http://tryr.codeschool.com/levels/6/challenges/1http://tryr.codeschool.com/levels/6/challenges/1
  • 7/28/2019 R by samyojit

    35/41

    The "infantry.txt" file has a similar format, but its fields are separated by tabcharacters rather than commas. Its content looks like this:

    Port InfantryPorto Bello 700Cartagena 500

    Panama City 1500Havana 2000

    For files that use separator strings other than commas, you can use the read.table

    function. The sep argument defines the separator character, and you can specify a tab

    character with "\t".

    Call read.table on "infantry.txt", using tab separators:

    read.table("infantry.txt", sep="\t")

    RedoComplete

    > read.table("infantry.txt", sep="\t")V1 V2

    1 City Infantry2 Porto Bello 7003 Cartagena 5004 Panama City 15005 Havana 2000

    10.+75 pointsNotice the "V1" and "V2" column headers? The first line is not automatically treated as

    column headers with read.table. This behavior is controlled by the header argument.

    Call read.table again, setting header to TRUE:

    read.table("infantry.txt", sep="\t", header=TRUE)

    RedoComplete

    > read.table("infantry.txt", sep="\t", header=TRUE)City Infantry

    1 Porto Bello 7002 Cartagena 5003 Panama City 15004 Havana 2000

    11. Merging Data Frames 6.4+75 points

    http://tryr.codeschool.com/levels/6/challenges/1http://tryr.codeschool.com/levels/6/challenges/1http://tryr.codeschool.com/levels/6/challenges/1http://tryr.codeschool.com/levels/6/challenges/1http://tryr.codeschool.com/levels/6/challenges/1http://tryr.codeschool.com/levels/6/challenges/1http://tryr.codeschool.com/levels/6/challenges/1http://tryr.codeschool.com/levels/6/challenges/1http://tryr.codeschool.com/levels/6/challenges/1http://tryr.codeschool.com/levels/6/challenges/1http://tryr.codeschool.com/levels/6/challenges/1http://tryr.codeschool.com/levels/6/challenges/1
  • 7/28/2019 R by samyojit

    36/41

    We want to loot the city with the most treasure and the least guards. Right now, though,

    we have to look at both files and match up the rows. It would be nice if all the data for a

    port were in one place...

    R's merge function can accomplish precisely that. It joins two data frames together, using

    the contents of one or more columns. First, we're going to store those file contents in twodata frames for you, targets and infantry.

    The merge function takes arguments with an x frame (targets) and a y frame

    (infantry). By default, it joins the frames on columns with the same name (the two Portcolumns). See if you can merge the two frames:

    merge(x = targets, y = infantry)

    RedoComplete

    > targets infantry merge(x = targets, y = infantry)

    Port Population Worth Infantry1 Cartagena 35000 10000 5002 Havana 140000 50000 20003 Panama City 105000 35000 15004 Porto Bello 49000 15000 700

    1. Try R Chapter 7Real-World Data

    http://tryr.codeschool.com/levels/6/challenges/1http://tryr.codeschool.com/levels/6/challenges/1http://tryr.codeschool.com/levels/6/challenges/1http://tryr.codeschool.com/levels/6/challenges/1http://tryr.codeschool.com/levels/6/challenges/1http://tryr.codeschool.com/levels/6/challenges/1
  • 7/28/2019 R by samyojit

    37/41

    So far, we've been working purely in the abstract. It's time to take a look at some real

    data, and see if we can make any observations about it.

    Continue

    2.Some Real World Data 7.1+75 points

    Modern pirates plunder software, not silver. We have a file with the software piracy rate,

    sorted by country. Here's a sample of its format:

    Country,PiracyAustralia,23Bangladesh,90Brunei,67China,77

    ...

    We'll load that into the piracy data frame for you:

    > piracy gdp plot(countries$GDP, countries$Piracy)

    http://tryr.codeschool.com/levels/7/challenges/1http://tryr.codeschool.com/levels/7/challenges/1http://tryr.codeschool.com/levels/7/challenges/1http://tryr.codeschool.com/levels/7/challenges/1http://tryr.codeschool.com/levels/7/challenges/1http://tryr.codeschool.com/levels/7/challenges/1
  • 7/28/2019 R by samyojit

    38/41

    3. +75 pointsIt looks like there's a negative correlation between wealth and piracy - generally, thehigher a nation's GDP, the lower the percentage of software installed that's pirated. But

    do we have enough data to support this connection? Is there really a connection at all?

    R can test for correlation between two vectors with the cor.test function. Try calling it on

    the GDP and Piracy columns of the countries data frame:

    cor.test(countries$GDP, countries$Piracy)

    RedoComplete

    > cor.test(countries$GDP, countries$Piracy)

    Pearson's product-moment correlation

    data: countries$GDP and countries$Piracyt = -14.8371, df = 107, p-value < 2.2e-16alternative hypothesis: true correlation is not equal to 095 percent confidence interval:-0.8736179 -0.7475690sample estimates:

    cor-0.8203183

    The key result we're interested in is the "p-value". Conventionally, any correlation with ap-value less than 0.05 is considered statistically significant, and this sample data's p-value

    is definitely below that threshold. In other words, yes, these data do show a statistically

    significant negative correlation between GDP and software piracy.

    4. +75 pointsWe have more countries represented in our GDP data than we do our piracy rate data. If

    we know a country's GDP, can we use that to estimate its piracy rate?

    We can, if we calculate the linear model that best represents all our data points (with a

    certain degree of error). The lm function takes a model formula, which is represented by a

    response variable (piracy rate), a tilde character (~), and apredictor variable (GDP).(Note that the response variable comesfirst.)

    Try calculating the linear model for piracy rate by GDP, and assign it to the linevariable:

    line

  • 7/28/2019 R by samyojit

    39/41

    > line abline(line)

    Now, if we know a country's GDP, we should be able to make a reasonable prediction ofhow common piracy is there!

    6.ggplot2 7.2+75 points

    The functionality we've shown you so far is all included with R by default. (And it'spretty powerful, isn't it?) But in case the default installation doesn't include that function

    you need, there are still more libraries available on the servers of the Comprehensive R

    Archive Network, or CRAN. They can add anything from new statistical functions tobetter graphics capabilities. Better yet, installing any of them is just a command away.

    Let's install the popularggplot2 graphics package. Call the install.packages function

    with the package name in a string:

    install.packages("ggplot2")

    RedoComplete

    > install.packages("ggplot2")--- Please select a CRAN mirror for use in this session ---Loading Tcl/Tk interface ... donetrying URL'http://rweb.quant.ku.edu/cran/src/contrib/ggplot2_0.9.2.1.tar.gz'Content type 'application/x-gzip' length 2310996 bytes (2.2 Mb)opened URL==================================================

    downloaded 2.2 Mb

    * installing *source* package 'ggplot2' ...** package 'ggplot2' successfully unpacked and MD5 sums checked** R** data** moving datasets to lazyload DB** inst** preparing package for lazy loading** help

    http://tryr.codeschool.com/levels/7/challenges/1http://tryr.codeschool.com/levels/7/challenges/1http://tryr.codeschool.com/levels/7/challenges/1http://tryr.codeschool.com/levels/7/challenges/1http://tryr.codeschool.com/levels/7/challenges/1http://tryr.codeschool.com/levels/7/challenges/1http://tryr.codeschool.com/levels/7/challenges/1http://tryr.codeschool.com/levels/7/challenges/1http://tryr.codeschool.com/levels/7/challenges/1http://tryr.codeschool.com/levels/7/challenges/1http://tryr.codeschool.com/levels/7/challenges/1http://tryr.codeschool.com/levels/7/challenges/1
  • 7/28/2019 R by samyojit

    40/41

    *** installing help indices** building package indices** testing if installed package can be loaded

    * DONE (ggplot2)

    7. +75 pointsYou can get help for a package by calling the help function and passing the packagename in the package argument. Try displaying help for the "ggplot2" package:

    help(package = "ggplot2")

    RedoComplete

    > help(package = "ggplot2")Information on package 'ggplot2'

    Description:

    Package: ggplot2Type: PackageTitle: An implementation of the Grammar of GraphicsVersion: 0.9.1

    ...

    8. +75 pointsHere's a quick demo of the power you've just added to R. To use it, let's revisit some data

    from a previous chapter.

    > weights prices chests types

  • 7/28/2019 R by samyojit

    41/41

    ggplot2 is just the first of many powerful packages awaiting discoveryon CRAN. And of course, there's much, much more functionality in thestandard R libraries. This course has only scratched the surface!