Research in Applied Econometrics Chapter 1. R - UDLrisques- ?· Research in Applied Econometrics Chapter…

Download Research in Applied Econometrics Chapter 1. R - UDLrisques- ?· Research in Applied Econometrics Chapter…

Post on 02-Sep-2018

213 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • Research in Applied Econometrics Chapter 1. R

    Research in Applied EconometricsChapter 1. R

    Pr. Philippe Polom, Universit Lumire Lyon 2

    M1 APE Analyse des Politiques conomiques

    M1 RISE Gouvernance des Risques Environnementaux

    2017 2018

  • Research in Applied Econometrics Chapter 1. R

    SWIRL

    Outline

    SWIRL

    Data Management

    R graphics

    Linear Regressions

    Discussing Regressors and Model Building

    Document Edition Functionalities

  • Research in Applied Econometrics Chapter 1. R

    SWIRL

    SWIRL

    I Do Course 1 : R programming, Lessons 1-9 + 14 by yourselfI To quit a lesson : escI Answer no to any proposition to registerI Following ...

    I press I Sometimes, much text is to be read that is a good exercice

    I Follow the commands in the RAE2017.RI They follow the slides

    I We do just Lesson 1I To make sure you can start the other lessons by yourself

  • Research in Applied Econometrics Chapter 1. R

    SWIRL

    SWIRL R programming overview

    1 : Basic Building Blocks 2 : Workspace and Files3 : Sequences of Numbers 4 : Vectors5 : Missing Values 6 : Subsetting Vectors7 : Matrices and Data Frames 8 : Logic9 : Functions 10 : lapply and sapply11 : vapply and tapply 12 : Looking at Data13 : Simulation 14 : Dates and Times15 : Base Graphics

  • Research in Applied Econometrics Chapter 1. R

    SWIRL

    A few commands outside of SWIRL

    I In R-Studio, create a new project (upper right button)I Call it RAE for exampleI Store it where you can find it back

    I Execute the commands on RAE2017.R to see the outputI Usual math functions : log, exp, sign, sqrt, abs, min, max

    I log(exp(sin(pi/4)^2)*exp(cos(pi/4)^2)) Type in Console I Special vectors

    I ones

  • Research in Applied Econometrics Chapter 1. R

    SWIRL

    Mtx OperationsI A

  • Research in Applied Econometrics Chapter 1. R

    SWIRL

    Mtx Operations

    I det(A1) determinantI eigen(A1) eigenvaluesI chol(A1) Cholesky decomposition (type ?chol in Console)I solve(A1) inverseI A %*% B mtx product

    I A*A element-by-element productI kronecker(A, B) Kronecker element (type ?kronecker)

    I crossprod(A, B) ecient calculation of ABI diag(A1) extract diag

  • Research in Applied Econometrics Chapter 1. R

    SWIRL

    Mtx Operations

    I cbind(1, A1) combine one C of ones and A1

    . . .

    . . .

    I rbind(A1, diag(4, 2)) stack A1 & a diag mtx of size 2 w/ 4on the diag

    . .

    . .. .. .

  • Research in Applied Econometrics Chapter 1. R

    Data Management

    Outline

    SWIRL

    Data Management

    R graphics

    Linear Regressions

    Discussing Regressors and Model Building

    Document Edition Functionalities

  • Research in Applied Econometrics Chapter 1. R

    Data Management

    Dataframe

    I Frame = contextI In R, a Dataframe is a data mtx

    I a collection of vectors of same lengthI Stacked together horizontaly

    I Each vector = 1 C = variableI Possibly of dierent natures

    I quantitative, numeric but qualitative, characters, dates...I it may further contain meta-data

    I e.g. variable type or categories nameI Each R = 1 obs in the sampleI An array is, in R, a more general object as it may have more

    than 2 dimensions

  • Research in Applied Econometrics Chapter 1. R

    Data Management

    Dataframe Creation

    I Several waysI keyboard (cfr Swirl programming lesson 7)I read R fileI import

    I keyboard exampleI alternative 1

    I mydata

  • Research in Applied Econometrics Chapter 1. R

    Data Management

    attach

    I A dataframe is attachedI w/ command attachI then variables names in the dataframe maybe used directly in

    commandsI For example

    I mean(two) produce an error messageI attach(mydata) and then mean(two) produces the average of

    variable twoI detach(mydata) is self-explanatory

    I Why detach ? e.g. to avoid confusionsI Attacher for a single operation

    I with(mydata, mean(two))

  • Research in Applied Econometrics Chapter 1. R

    Data Management

    Subset Selection

    I As seen in swirl a subset of a Dataframe can be accessed by[ or $

    I $ extract a single varaibleI The command subset sometimes work better (e.g.

    conditional selection)I e.g. mydata.sub

  • Research in Applied Econometrics Chapter 1. R

    Data Management

    Export (write) a dataframe

    I write.table(mydata, file=mydata.txt, col.names=TRUE)I create a txt file mydata.txt in the working directory

    I normally where your project isI Meta-data are not passed

    I The text file format is

    one two three1 1 11 212 2 12 22...

    I So that it looks like the C headers are shifted leftI Take that into account accordingly w/ the software you use to

    open it

  • Research in Applied Econometrics Chapter 1. R

    Data Management

    Import (read) a dataframeI From a text file (.txt or .csv)

    I newdata

  • Research in Applied Econometrics Chapter 1. R

    Data Management

    Import a dataframeI scan is used for data that are not in mtx form

    I ?scanI Import from another software : excel, stata, sas...

    I Easiest : if you have access to the software, export the data filein txt or csv

    I loss of meta-dataI R-Studio proposes several formats

    I It does not work often as these software change their formatsoften

    I Use GoogleI e.g. R import Stata 17 data

    I Also www.statmethods.net/input/importingdata.htmlI for a few formats

  • Research in Applied Econometrics Chapter 1. R

    R graphics

    Outline

    SWIRL

    Data Management

    R graphics

    Linear Regressions

    Discussing Regressors and Model Building

    Document Edition Functionalities

  • Research in Applied Econometrics Chapter 1. R

    R graphics

    Plot

    I First SWIRLI course R-programming, lesson 15 Base graphics

    I A few additional graphic elements using package plotI Packages lattice ggplot2 are better

    I http ://varianceexplai-ned.org/RData/code/code_lesson2/#segment1

    I R has many publication-quality graphicsI But they are not very intuitive

    I plot( ) is the default graphic command for many objects :I dataframes, time sries, fitted linear models

    I it is also an old, crude, command

  • Research in Applied Econometrics Chapter 1. R

    R graphics

    Examples with data("CPS1988")

    I Data file is cps1988 preloaded in the AER packageI Pop. survey March 1988, US Census BureauI 28 155 obs., cross-sectionI Men, 18-70 y-oI Income > US$ 50 in 1988I Not self-employed, not working w/o salary

    I summary(CPS1988)I Quantitative data

    I wage $/weekI education & experience (=age-education-6) in years

  • Research in Applied Econometrics Chapter 1. R

    R graphics

    Scatterplots dispersion XYI Probably the + commons in stat (with histograms)

    I We use CPS1988 : a census data file on wage and itsdeterminants

    I From the AER packageI attach(CPS1988)

    I plot(education, log(wage))I First is on arg in x-axis, 2nd in y-axis

    I rug(education)I rug(log(wage), side=2)I rug = tapis is a 1-D plot

    I detach(CPS1988)I plot(log(subs)~log(citeprice), data=Journals)

    I alternative to avoid attaching the dataframe

  • Research in Applied Econometrics Chapter 1. R

    R graphics

    R Graphic Parameters

    I A plot results may be modified in many waysI E.g. argument type controls if the plot is made points (type =

    p), lines (type = l), both (type = b), steps (type = s) orothers

    I Several dozens parameters may be modifiedI See ?parI They may be modified after the plot w/ command par( )I Or thay can be supplied in the plot( ) command e.g.

    plot(log(wage)~education, data=CPS1988, pch=20,col="blue", ylim=c(4,10), xlim=c(0,20), main="Wage byeducation years")

    I Next slide : list of par

  • Argument Descriptionaxes should axes be drawn ?bg background colorcex size of a point or symbolcol colorlas orientation of axis label

    lty, lwd line type and line widthmain, subs main title and subtitle

    mar size of marginsmfcol, mfrow array defining layout for several graphs on one plot

    pch plotting symboltype types

    xlab, ylab axis labelsxlim, ylim axis ranges

    xlog, ylog, log logarithmic scales

  • Research in Applied Econometrics Chapter 1. R

    R graphics

    R Graphic Parameters

    I Add layer(s) to a plot : lines( ), points( ), text( ), legend( )I Add a straight line abline(a, b)

    I a intercept, b slopeI 1 plot over another

    I x

  • Research in Applied Econometrics Chapter 1. R

    R graphics

    Export graphicsI To use R graphics in other software

    I Export send the graph on a deviceI Really : just a .pdf or .jpg file extension

    I All devices work similarly in R, see ?devices1. The device is opend by a command that bears its name, e.g.

    pdf( )2. Then, the plot is executed3. Finaly, the device is closed dev.o( )

    I ExampleI pdf("myfile.pdf", height=5, width=6)I plot(1 :20, pch=1 :20, col=1 :20, cex=2)I dev.o()

    I Search myfile.pdf on your laptopI Simplest : Export button in Plots window

  • Research in Applied Econometrics Chapter 1. R

    R graphics

    Math Formulas in a Plot

    I R may pass a formula in a plot via LATEXI see ?plotmath

    I ExampleI plot of the std normal density w/ its math definitionI curve(dnorm, from=-5, to=5, col="slategray", lwd=3,

    main="Density of the Standard Normal Distribution")I text(-5, 0.3, expression(f(x) == frac(1, sigma ~~ sqrt(2*pi))

    ~~ e^{-frac((x - mu)^2, 2*sigma^2)}), adj=0)I Unfo