sipi data days 2019 n. thompson, phd...2019/07/12  · 53/complex-headers-in-angular2-data-table 16...

30
Welcome to data days N. Thompson, PhD SIPI data days 2019

Upload: others

Post on 20-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SIPI data days 2019 N. Thompson, PhD...2019/07/12  · 53/complex-headers-in-angular2-data-table 16 What is tidy data? Data are in a table Each variable gets a column Each observation

Welcome to data days

N. Thompson, PhDSIPI data days 2019

Page 2: SIPI data days 2019 N. Thompson, PhD...2019/07/12  · 53/complex-headers-in-angular2-data-table 16 What is tidy data? Data are in a table Each variable gets a column Each observation

My story: I’m Nicole...

2

❏ Cuban-American❏ Grew up all over USA❏ Loves:

❏ fantasy, sci-fi❏ human language and

culture❏ physics and chemistry

❏ Wanted to tie it all together

Page 3: SIPI data days 2019 N. Thompson, PhD...2019/07/12  · 53/complex-headers-in-angular2-data-table 16 What is tidy data? Data are in a table Each variable gets a column Each observation

I’m now a behavioral ecologistAfter 3 degrees and lots of exploring...

My job is to research animal behavior and physiology.

❏ Analyze data almost every day.

❏ Greatest tool: A computer programming language called “R”.

3

Page 4: SIPI data days 2019 N. Thompson, PhD...2019/07/12  · 53/complex-headers-in-angular2-data-table 16 What is tidy data? Data are in a table Each variable gets a column Each observation

My goals for you

❏ Apply the principles of tidy data and data visualization❏ Use curiosity and creativity to generate and answer

research questions on socially relevant topics

4

After our 2 day-long sessions you will be able to

❏ Manipulate & explore a data set using R programming language

❏ Visualize patterns in data with R

Page 5: SIPI data days 2019 N. Thompson, PhD...2019/07/12  · 53/complex-headers-in-angular2-data-table 16 What is tidy data? Data are in a table Each variable gets a column Each observation

… and have fun!

5

Page 6: SIPI data days 2019 N. Thompson, PhD...2019/07/12  · 53/complex-headers-in-angular2-data-table 16 What is tidy data? Data are in a table Each variable gets a column Each observation

What is R? A language to talk to your computer

6Diagram courtesy of Garret Grolemund

Page 7: SIPI data days 2019 N. Thompson, PhD...2019/07/12  · 53/complex-headers-in-angular2-data-table 16 What is tidy data? Data are in a table Each variable gets a column Each observation

Who is R for? Everyone.

7

Page 8: SIPI data days 2019 N. Thompson, PhD...2019/07/12  · 53/complex-headers-in-angular2-data-table 16 What is tidy data? Data are in a table Each variable gets a column Each observation

What is data science?

8

Wickham & Grolemund, r4ds

Page 9: SIPI data days 2019 N. Thompson, PhD...2019/07/12  · 53/complex-headers-in-angular2-data-table 16 What is tidy data? Data are in a table Each variable gets a column Each observation

Exploratory data analysis is one important part

9

Wickham & Grolemund, r4ds

Page 10: SIPI data days 2019 N. Thompson, PhD...2019/07/12  · 53/complex-headers-in-angular2-data-table 16 What is tidy data? Data are in a table Each variable gets a column Each observation

Your capstone team projects

❏ Choose data sets❏ Become familiar with them and form research questions

❏ Use functions in R to answer questions❏ Data transformations and summaries (package dplyr)❏ Data visualizations (package ggplot2)

❏ Present questions and findings to class in 15 min

Date of 15 minute group presentations is TBD.

10

Page 11: SIPI data days 2019 N. Thompson, PhD...2019/07/12  · 53/complex-headers-in-angular2-data-table 16 What is tidy data? Data are in a table Each variable gets a column Each observation

Our schedule

Day 1: 7/12/19

Literacy: Choose and describe a data set, create research questions

Transformations: Exploring data sets with R by subsets, transformations, and summaries

12

Day 2: 7/19/19

Graphics best practices: Evaluate and interpret visualizations

Visualizations: Exploring data sets with R by graphical plotting

*Exploring = answering questions*

Page 12: SIPI data days 2019 N. Thompson, PhD...2019/07/12  · 53/complex-headers-in-angular2-data-table 16 What is tidy data? Data are in a table Each variable gets a column Each observation

Let’s meet R in RStudio

13

Page 13: SIPI data days 2019 N. Thompson, PhD...2019/07/12  · 53/complex-headers-in-angular2-data-table 16 What is tidy data? Data are in a table Each variable gets a column Each observation

Troubleshooting

Run “?function_name” - for help

GOOGLE “R error name/function name/task”

Ask a friend.

Ask me!

Know you can do it.

14

Page 14: SIPI data days 2019 N. Thompson, PhD...2019/07/12  · 53/complex-headers-in-angular2-data-table 16 What is tidy data? Data are in a table Each variable gets a column Each observation

Introduction to Tidy Data

N. Thompson, PhDSIPI data days 2019

Page 15: SIPI data days 2019 N. Thompson, PhD...2019/07/12  · 53/complex-headers-in-angular2-data-table 16 What is tidy data? Data are in a table Each variable gets a column Each observation

Importance of data literacy

https://en.wikipedia.org/wiki/Data

https://www.digitaltveurope.com/2019/05/31/data-to-drive-40-of-tv-ad-spend-by-2020/

https://stackoverflow.com/questions/40182253/complex-headers-in-angular2-data-table 16

Page 16: SIPI data days 2019 N. Thompson, PhD...2019/07/12  · 53/complex-headers-in-angular2-data-table 16 What is tidy data? Data are in a table Each variable gets a column Each observation

What is tidy data?

❏ Data are in a table

❏ Each variable gets a column

❏ Each observation gets a row

❏ Each cell is a single value

❏ Each type of observation gets its own table

Fig 12.1, Wickham & Grolemund “R for Data Science”

17

Page 17: SIPI data days 2019 N. Thompson, PhD...2019/07/12  · 53/complex-headers-in-angular2-data-table 16 What is tidy data? Data are in a table Each variable gets a column Each observation

Tidy data sets have data dictionaries

Data dictionary: a description of each variable in a data set, including its data type and units.

Soon, you will write your own data dictionaries in teams.

18

Page 18: SIPI data days 2019 N. Thompson, PhD...2019/07/12  · 53/complex-headers-in-angular2-data-table 16 What is tidy data? Data are in a table Each variable gets a column Each observation

Example tidy data set: Diabetes risk factors in Pima women from AZFrom: https://www.kaggle.com/uciml/pima-indians-diabetes-database

19

Page 19: SIPI data days 2019 N. Thompson, PhD...2019/07/12  · 53/complex-headers-in-angular2-data-table 16 What is tidy data? Data are in a table Each variable gets a column Each observation

Continuous

Data dictionary: define the variables

ContinuousCategorical

Diabetes risk factors in Pima women from AZ

Logical

31

Page 20: SIPI data days 2019 N. Thompson, PhD...2019/07/12  · 53/complex-headers-in-angular2-data-table 16 What is tidy data? Data are in a table Each variable gets a column Each observation

Is it tidy?Diabetes risk factors in Pima women from AZ

❏ Data are in a table

❏ Each variable gets a column

❏ Each observation gets a row - a woman >21 yrs old

❏ Each cell is a single value

❏ Each type of observation gets its own table - diagnosis and measurements per woman

35

Page 21: SIPI data days 2019 N. Thompson, PhD...2019/07/12  · 53/complex-headers-in-angular2-data-table 16 What is tidy data? Data are in a table Each variable gets a column Each observation

Your turn…

❏ Break into teams of 3 - lead detective, scribe, & reporter

❏ Choose data sets - view in R Studio

❏ Learning goals for 1st group activity:

❏ create a data dictionary for chosen data set

❏ formulate research questions and diagnose limitations of data set

36

Page 22: SIPI data days 2019 N. Thompson, PhD...2019/07/12  · 53/complex-headers-in-angular2-data-table 16 What is tidy data? Data are in a table Each variable gets a column Each observation

Team roles

Reporter: communicates the team’s findings, process, and questions to the class as a whole.

37

Lead detective: drives the team toward its goal, takes charge of plans of action, watches the clock.

Scribe: writes down the team’s initial answers on worksheets and writes initial code.

Page 23: SIPI data days 2019 N. Thompson, PhD...2019/07/12  · 53/complex-headers-in-angular2-data-table 16 What is tidy data? Data are in a table Each variable gets a column Each observation

Project data sets:

1. Cancer rates by US state in 2017

2. Human trafficking in the USA in 2016 (some untidiness!)

3. Crime rates in major metropolitan areas

4. Gun crime in the USA 2012-2014

5. Diabetes risk factors among Pima women in AZ

38

Page 24: SIPI data days 2019 N. Thompson, PhD...2019/07/12  · 53/complex-headers-in-angular2-data-table 16 What is tidy data? Data are in a table Each variable gets a column Each observation

Exploratory Data Analysis (EDA) in R

N. Thompson, PhDSIPI data days 2019

Page 25: SIPI data days 2019 N. Thompson, PhD...2019/07/12  · 53/complex-headers-in-angular2-data-table 16 What is tidy data? Data are in a table Each variable gets a column Each observation

Moving on from tidy data… time to start exploring

40

Wickham & Grolemund, r4ds

Page 26: SIPI data days 2019 N. Thompson, PhD...2019/07/12  · 53/complex-headers-in-angular2-data-table 16 What is tidy data? Data are in a table Each variable gets a column Each observation

Key functions you will learn (see handouts)

Dplyr functions:

%>%

select()

filter()

mutate()

summarise()

group_by()

41

Base R arithmetic & notation:

<- “assignment”

==, != “equal to”, “not equal to”

>, <, >=, <= inequalities

&, | intersection, union

str(), View(), c()

mean(), sd(), sum()

Page 27: SIPI data days 2019 N. Thompson, PhD...2019/07/12  · 53/complex-headers-in-angular2-data-table 16 What is tidy data? Data are in a table Each variable gets a column Each observation

Key functions you will learn cont’d (see handouts)

Functions for data types:

class()

is.na()

as.numeric() - continuous

as.character() - categorical

as.factor() - categorical

42

Base subsetting:

Data[a,b] - a index = rows, b index = columns

Data$name - select a column

Page 28: SIPI data days 2019 N. Thompson, PhD...2019/07/12  · 53/complex-headers-in-angular2-data-table 16 What is tidy data? Data are in a table Each variable gets a column Each observation

Learning to code...

1. Observe live coding

2. Copy sections of live code

3. Fill in blanks and perform exercises solo

4. Share progress with teammates

43

Page 29: SIPI data days 2019 N. Thompson, PhD...2019/07/12  · 53/complex-headers-in-angular2-data-table 16 What is tidy data? Data are in a table Each variable gets a column Each observation

To our consoles!

44

Page 30: SIPI data days 2019 N. Thompson, PhD...2019/07/12  · 53/complex-headers-in-angular2-data-table 16 What is tidy data? Data are in a table Each variable gets a column Each observation

Benefits of tidy data

❏ Consistent and predictable structure

❏ Prevents errors in your own analyses

❏ Increases clarity for others to follow your analyses

45