basic data analysis using r

21
Basic Data Analysis using R C. Tobin Magle, PhD 02-08-2017 10:00-11:00 a.m. Morgan Library Computer Classroom 175 Based on http://www.datacarpentry.org/R-ecology-lesson /

Upload: c-tobin-magle

Post on 12-Apr-2017

197 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Basic data analysis using R

Basic Data Analysis using R

C. Tobin Magle, PhD02-08-2017

10:00-11:00 a.m.Morgan Library

Computer Classroom 175Based on http://www.datacarpentry.org/R-ecology-lesson/

Page 2: Basic data analysis using R

Outline

• Intro to R and R studio

• Operators and functions

• Data Frames

• Factors

Page 3: Basic data analysis using R

What is R? R Studio?

• R – a programming language + software that interprets it

• RStudio – popular software to write R scripts and interact with the R software

• Need both

Page 4: Basic data analysis using R

Why learn R

• Research Reproducibility

• Widely used, 10000+ “packages”

• Works on many data types

• Produced high-quality graphics

• Free, open source, cross platform

Page 5: Basic data analysis using R

R Studio Interface

Page 6: Basic data analysis using R

Setup a working directory

• Start RStudio •  File >  New project > New directory > Empty project• Enter a name for this new folder and choose a convenient

location for it (working directory)• Click on “Create project”• Create a data folder in your working directory• Create a new R script (File > New File > R script) and save it in

your working directory

Page 7: Basic data analysis using R

Organize your working directory

Page 8: Basic data analysis using R

Script vs console

• Both accept commands

• Console: runs the commands• Doesn’t save*

• Script: commands you want to save for later; • These commands need to be sent to the console to be run• Ctrl-enter to send from script to console

Page 9: Basic data analysis using R

The assignment operator

• The command 5+5 yields the answer 10• Prints 10 to the console• But does not save the number 10 anywhere

• The assignment operator saves values into objects• <object> <- <value>• Weight_kg <- 55

• Short key for the assignment operator: alt- dash

Page 10: Basic data analysis using R

You can do math on variables

• Example: conversion from kg to lb

• 2.2* weight_kg

• weight_lb <- 2.2*weight_kg

Page 11: Basic data analysis using R

Functions and arguments

• Functions are canned scripts• Predefined, packages, “home-made”

• Accepts arguments (input)

• Return a value (output)

• Examples: sqrt, round• args(round)

Page 12: Basic data analysis using R

Working with data

• Store tables in a type of object called a “data frame”• Rows = observations• Cols = variables

• Can download using download.file• download.file("https://ndownloader.figshare.com/files/2292169",

"data/portal_data_joined.csv")

• Read data using read.csv function• surveys <- read.csv('data/portal_data_joined.csv')

Page 13: Basic data analysis using R

Inspecting data frames

• head(surveys) = look at first 6 rows (all columns)• str(surveys) = structure # rows, cols, data types• nrow(surveys) = number of columns• ncol(surveys) = number of columns• names(surveys) = column names• summary(surveys) = does summary stats for each column

Page 14: Basic data analysis using R

Subsetting (Using brackets)

• Row column format: surveys[row,column] • surveys[1,2] = first row, second column

• Leave it blank = surveys[,column]• surveys[1,] = first row, all column• surveys[,1] = first column, all rows

• Ranges = surveys[range, column]• surveys[1:3, 7] = rows 1-3, 7th column

Page 15: Basic data analysis using R

By column name

• surveys["species_id"] # Result is a data.frame • surveys[, "species_id"] # Result is a vector• surveys[["species_id"]] # Result is a vector • surveys$species_id # Result is a vector

Page 16: Basic data analysis using R

Factors• Represent categorical data• Can be ordered or unordered• Critical for stats and plotting• Stored as integers with text labels – be careful!

• Orders labels by alpha order of text labels• Functions

• sex <- factor(c("male", "female", "female", "male"))• levels(sex) • nlevels(sex)• sex <- factor(sex, levels = c("male", "female"))

Page 17: Basic data analysis using R

Converting factors

• With characters• as.character(sex)

• With numbers• f <- factor(c(1990, 1983, 1977, 1998, 1990)) • as.numeric(f) # wrong! and there is no warning...

as.numeric(as.character(f)) # works... • as.numeric(levels(f))[f] # The recommended way.

Page 18: Basic data analysis using R

Example: plotting factors

• plot(survey$sex)

• what’s with the unlabeled bar?

Page 19: Basic data analysis using R

Renaming levels

• Label missing values• sex <- surveys$sex # subset the column• head(sex) # look at first 6 records• levels(sex) # look at the factor levels• levels(sex)[1] <- "missing" # change the first label to “missing”• levels(sex) # look at factor levels again• head(sex) # see where missing values were

Page 20: Basic data analysis using R

What if you don’t want to use levels?

• Argument: stringsAsFactors=FALSE

## Compare the difference between when the data are being read as ## `factor`, and when they are being read as `character`.

surveys <- read.csv("data/portal_data_joined.csv", stringsAsFactors = TRUE) str(surveys)

surveys <- read.csv("data/portal_data_joined.csv", stringsAsFactors = FALSE) str(surveys)

## Convert the column "plot_type" into a factor surveys$plot_type <- factor(surveys$plot_type)

Page 21: Basic data analysis using R

Need help?

• Email: [email protected]

• Data Management Services website: http://lib.colostate.edu/services/data-management

• Data Carpentry: http://www.datacarpentry.org/• R Ecology Lesson: http://www.datacarpentry.org/OpenRefine-ecology-lesson/