regression iii: advanced methods - michigan state...

21
Lecture 2: Software Introduction Regression III: Advanced Methods William G. Jacoby Department of Political Science Michigan State University [email protected]

Upload: doannhi

Post on 13-Mar-2018

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Regression III: Advanced Methods - Michigan State …polisci.msu.edu/jacoby/icpsr/regress3/lectures/week1/2.Software... · Regression III: Advanced Methods William G. Jacoby Department

Lecture 2: Software Introduction

Regression III:Advanced Methods

William G. JacobyDepartment of Political Science

Michigan State University

[email protected]

Page 2: Regression III: Advanced Methods - Michigan State …polisci.msu.edu/jacoby/icpsr/regress3/lectures/week1/2.Software... · Regression III: Advanced Methods William G. Jacoby Department

2

Getting Started with R

• What is R?• A tiny R session• Resources• Setup under Windows• Getting data into R• Graphs and Statistical Models

Page 3: Regression III: Advanced Methods - Michigan State …polisci.msu.edu/jacoby/icpsr/regress3/lectures/week1/2.Software... · Regression III: Advanced Methods William G. Jacoby Department

3

What is R?

• A free, open-source implementation of the S language for data analysis and graphics

• Available for various operating systems (including Linux, Mac and Windows)

• A complete programming language• Supported by a comprehensive help system and a large

international community of users• Increasingly used in advanced social-science research, as well

as in many other disciplines• In constant flux• Not guaranteed by anyone to be fit for any purpose!

Page 4: Regression III: Advanced Methods - Michigan State …polisci.msu.edu/jacoby/icpsr/regress3/lectures/week1/2.Software... · Regression III: Advanced Methods William G. Jacoby Department

4

R Resources

• The main source for R and everything connected with it is the Comprehensive R Archive Network (CRAN):– http://cran.r-project.org

• There one can find:– Binary versions of R to download and install– The source code– Extensive documentation and contributed guides– Information about many add-on packages

Page 5: Regression III: Advanced Methods - Michigan State …polisci.msu.edu/jacoby/icpsr/regress3/lectures/week1/2.Software... · Regression III: Advanced Methods William G. Jacoby Department

5

Add-on Packages

• Add-on packages are easily installed from the menus within R (We will use a number of packages in this course)

• Once installed, the package must be loaded into the current interactive R session

• Many packages contain datasets. These must also be loaded and attached to be used

• The number symbol “#” is used to insert comments—R will not read anything after it (only works for a single line)

Page 6: Regression III: Advanced Methods - Michigan State …polisci.msu.edu/jacoby/icpsr/regress3/lectures/week1/2.Software... · Regression III: Advanced Methods William G. Jacoby Department

6

Documentation for R

• Installed as part of the R help system are the following documents:– An Introduction to R (about 100 pages). Gives an

introduction to the language and how to use R for doing statistical analysis and graphics

– R Data Import/Export (about 35 pages). Describes the import and export facilities available in R itself or via the foreign package

– Writing R Extensions (about 75 pages). Covers how to create your own packages, write R help files, etc.

• There are also various ‘unofficial’ guides on CRAN under ‘contributed’

• Finally, Fox (2002) provides a great introduction, focusing on the use of R for regression analysis.

Page 7: Regression III: Advanced Methods - Michigan State …polisci.msu.edu/jacoby/icpsr/regress3/lectures/week1/2.Software... · Regression III: Advanced Methods William G. Jacoby Department

7

Getting Help in R• A number of different types of help are available by

clicking the help menu:– Documentation on all installed packages is available in

a web browser by clicking help html help– The ‘official’ manuals can be loaded in PDF format by

clicking help manuals• Help about individual functions and objects can also be

obtained within R by typing – help(data) or ?data, for help on something whose

name is known– help.search(“ordinal”), to search all the installed

help files for occurrence of a particular text string– apropos(“stem”), to look for ‘stem’ in the names of

objects available in the current R session• If all else fails, use the R-help email list:

http://www.r-project.org/mail

Page 8: Regression III: Advanced Methods - Michigan State …polisci.msu.edu/jacoby/icpsr/regress3/lectures/week1/2.Software... · Regression III: Advanced Methods William G. Jacoby Department

8

Getting set-up

• Download the installer from CRAN • A good way of working with R is:

1. For each project, make a directory containing data and other materials.

2. Make a copy of the Rgui.exe shortcut. Rename the new shortcut. Right-click “Properties” and replace the entry in the “Start In” field with the new directory.

3. To use R for the project, double click on the newly-created shortcut. Data files can be loaded easily, and you can keep projects separate from each other.

4. It is also useful to have a good text editor. Notepad will do, but there are much better alternatives

– The R plug-in for WinEdt is called RWinEdt– An alternative is the Emacs Speaks Statistics (ESS)

package

Page 9: Regression III: Advanced Methods - Michigan State …polisci.msu.edu/jacoby/icpsr/regress3/lectures/week1/2.Software... · Regression III: Advanced Methods William G. Jacoby Department

9

Using a Text Editor with R

• It is useful to have a good text editor when working in R. This will facilitate easier preparation of R commands and saving program sessions in R Scripts.

• Windows Notepad provides minimal editing capabilities, but there are much better alternatives.

• The WinEdt shareware program (also frequently used with LaTeX) can be configured to work well with R.

• The RWinEdt plug-in sets everything up, and it is loaded as anR package.

• An alternative is the Emacs Speaks Statistics (ESS) package

• A text editor can be loaded into R automatically at the beginning of each session, by adding a statement to the Rprofile.

Page 10: Regression III: Advanced Methods - Michigan State …polisci.msu.edu/jacoby/icpsr/regress3/lectures/week1/2.Software... · Regression III: Advanced Methods William G. Jacoby Department

10

Getting data in (1): Entering data directly

• The concatenate function, c, combines individual cases together into a vector

• The cbind (columns bind) and rbind (rows bind) functions combine vectors together into a matrix

• The data.frame function makes the matrix and a data frame object

Page 11: Regression III: Advanced Methods - Michigan State …polisci.msu.edu/jacoby/icpsr/regress3/lectures/week1/2.Software... · Regression III: Advanced Methods William G. Jacoby Department

11

Getting data in (2): External datasets

• For rectangular data in a text file, use read.table():Mydata<-read.table(“dataname.txt”, header=TRUE)

– header=TRUE signifies that the first contains variable names

• The foreign library imports data files from other formats:

• use.value.labels=TRUE converts SPSS value labels to categories. If you specify FALSE, all variables will be treated as quantitative

• All SPSS variable names will be imported in upper case letters

Page 12: Regression III: Advanced Methods - Michigan State …polisci.msu.edu/jacoby/icpsr/regress3/lectures/week1/2.Software... · Regression III: Advanced Methods William G. Jacoby Department

12

Re-specifying variables after importing to R• To make a numerically coded variable into an unordered

factor (categorical variable):

• To make a numerical variable into an ordered factor:

Page 13: Regression III: Advanced Methods - Michigan State …polisci.msu.edu/jacoby/icpsr/regress3/lectures/week1/2.Software... · Regression III: Advanced Methods William G. Jacoby Department

13

Recoding Variables using the recodefunction in the car package

• Recoding into a quantitative variable:

• Recoding into an unordered factor:

Page 14: Regression III: Advanced Methods - Michigan State …polisci.msu.edu/jacoby/icpsr/regress3/lectures/week1/2.Software... · Regression III: Advanced Methods William G. Jacoby Department

14

S Modeling Language• The S modeling language is convenient in that it has a

similar notation for most types of models• Model specification generally takes the following form:

Response ~ Independent Variables• Where the tilde sign (~) is interpreted as “regressed on”• For the general linear model, terms represent additive

components as in the regression equation itself• Some examples of formulas are:

Page 15: Regression III: Advanced Methods - Michigan State …polisci.msu.edu/jacoby/icpsr/regress3/lectures/week1/2.Software... · Regression III: Advanced Methods William G. Jacoby Department

15

Graphs in R• All graphs are drawn on a chosen device either until a new

device is started, or the device is closed dev.off()

• Some commonly used graphics devices are

postscript(“mygraphs.ps”) – Necessary for LaTeX

pdf(“mygraph.pdf”)– Necessary for PDF LaTeX

trellis.device()- Used for the “Lattice” graphics system

windows()– The default graphics device

• Graphs in R are very flexible.

Page 16: Regression III: Advanced Methods - Michigan State …polisci.msu.edu/jacoby/icpsr/regress3/lectures/week1/2.Software... · Regression III: Advanced Methods William G. Jacoby Department

16

A small graph example (1)

Page 17: Regression III: Advanced Methods - Michigan State …polisci.msu.edu/jacoby/icpsr/regress3/lectures/week1/2.Software... · Regression III: Advanced Methods William G. Jacoby Department

17

A small graph example (2)

0 500 1000 1500 2000 2500 3000 3500

050000

150000

250000

BUCHANAN

BUSH

DADE

PALM.BEACH

Florida votes by county

Page 18: Regression III: Advanced Methods - Michigan State …polisci.msu.edu/jacoby/icpsr/regress3/lectures/week1/2.Software... · Regression III: Advanced Methods William G. Jacoby Department

18

Arc Software (A Brief Look)

• A user-friendly software package for regression analysis, focusing on regression graphics

• Available for various operating systems (including Linux, Mac and Windows)

• Written in the Xlisp-Stat language• Software can be downloaded from:

http://www.stat.umn.edu/arc/software.html

• Documentation for Arc is contained in the Cook and Weisberg (1999) text

• Arc reads data from text files with internal formatting commands. Easiest way to prepare data is to copy from examples distributed with the software.

• Arc is a simple, but extremely powerful, tool for dynamic statistical graphics.

Page 19: Regression III: Advanced Methods - Michigan State …polisci.msu.edu/jacoby/icpsr/regress3/lectures/week1/2.Software... · Regression III: Advanced Methods William G. Jacoby Department

19

Preparing Data for Arc

• Arc can read data in several formats. The easiest way is to “load” an ASCII text file. Arc will prompt you (via dialog boxes) for additional information about the dataset name, variable names, and so on.

• Like the “read.table” function in R, you can include variable names in the first row of a text data file. Variable names will be converted to upper-case, unless they are enclosed in double quote.

• Arc data files are stored with the extension “.lsp”. These data files are also ASCII text files, but they include internal formatting commands along with the data. The structure is very simple, and I recommend preparing Arc data files in a text processor before reading them in.

• Extensions to Arc (available on the web site) can read MS Excel spreadsheets and SAS datasets.

Page 20: Regression III: Advanced Methods - Michigan State …polisci.msu.edu/jacoby/icpsr/regress3/lectures/week1/2.Software... · Regression III: Advanced Methods William G. Jacoby Department

20

Output from Arc

• Arc will respond to some Xlisp-Stat commands, but most interaction with the software occurs through menu selections and dialog boxes.

• Two different methods for saving printed output:

– Material in the Arc window can be cut and pasted as usual, using the items from the Edit menu.

– Lines can be saved automatically to a specified file by selecting “Dribble” from the Arc menu.

• Graphical output from Arc is usually saved by cutting and pasting.

• Instructions for saving Arc graphs in LaTeX format are available on the website for the software.

Page 21: Regression III: Advanced Methods - Michigan State …polisci.msu.edu/jacoby/icpsr/regress3/lectures/week1/2.Software... · Regression III: Advanced Methods William G. Jacoby Department

21

Next topics

• Examining data• Transformations