stata training session 1

Upload: rajesh-kumar

Post on 14-Apr-2018

229 views

Category:

Documents


4 download

TRANSCRIPT

  • 7/29/2019 STATA Training Session 1

    1/46

    Sun LiCentre for Academic Computing

    [email protected]

    STATA Training Session 1Introduction to STATA

    mailto:[email protected]:[email protected]
  • 7/29/2019 STATA Training Session 1

    2/46

    Outline Computing Resources

    Getting Started with STATA

    Running STATA

    Datasets in STATA

    Data Management with STATA

    Exercise 1

    Data Descriptions & Simple Graphs

    Exercise 2

  • 7/29/2019 STATA Training Session 1

    3/46

    Download Training Slides , data and Syntax:

    http://research2.smu.edu.sg/CAC/StatisticalComputing/Wiki/Training%20Slides%20and%20Syntax.aspx

    http://research2.smu.edu.sg/CAC/StatisticalComputing/Wiki/Training%20Slides%20and%20Syntax.aspxhttp://research2.smu.edu.sg/CAC/StatisticalComputing/Wiki/Training%20Slides%20and%20Syntax.aspxhttp://research2.smu.edu.sg/CAC/StatisticalComputing/Wiki/Training%20Slides%20and%20Syntax.aspxhttp://research2.smu.edu.sg/CAC/StatisticalComputing/Wiki/Training%20Slides%20and%20Syntax.aspx
  • 7/29/2019 STATA Training Session 1

    4/46

    Computing ResourcesSTATA is a statistical package for managing, analyzing, and graphing

    data.

    has both command and menu-driven interface

    has cross-platform compatibility: Windows, Unix, and Mac.

    has three flavors:

    the standard Intercooled STATA (2047 variables)

    the more limited Small STATA (99 variables)

    the extended STATA/SE (32766 variables).

  • 7/29/2019 STATA Training Session 1

    5/46

    Computing ResourcesCAC Computing Resources for STATA users Windows:

    STATA/SE version 10.0

    10-user network perpetual license

    Installation guide(http://research2.smu.edu.sg/CAC/StatisticalComputing/Wiki/STATA-Software Questions.aspx)

    LinuxCAC Beowulf Cluster: STATA/SE version 10.0

    Unlimited users

    About CAC Beowulf Cluster:(http://research2.smu.edu.sg/CAC/HPC/Wiki/MAIN.aspx)

    New features in STATA 10.0 (http://www.stata.com/stata10)

    http://research2.smu.edu.sg/CAC/StatisticalComputing/Wiki/STATA-Software%20Questions.aspxhttp://research2.smu.edu.sg/CAC/StatisticalComputing/Wiki/STATA-Software%20Questions.aspxhttp://research2.smu.edu.sg/CAC/HPC/Wiki/MAIN.aspxhttp://www.stata.com/stata10http://www.stata.com/stata10http://research2.smu.edu.sg/CAC/HPC/Wiki/MAIN.aspxhttp://research2.smu.edu.sg/CAC/StatisticalComputing/Wiki/STATA-Software%20Questions.aspxhttp://research2.smu.edu.sg/CAC/StatisticalComputing/Wiki/STATA-Software%20Questions.aspxhttp://research2.smu.edu.sg/CAC/StatisticalComputing/Wiki/STATA-Software%20Questions.aspx
  • 7/29/2019 STATA Training Session 1

    6/46

    Getting Started

  • 7/29/2019 STATA Training Session 1

    7/46

    Getting Started

    Review box

    Variablewindow

    Command line

    Resultswindow

  • 7/29/2019 STATA Training Session 1

    8/46

    Getting StartedGetting help in STATA Help menu:

    contents : for a list of command categories & language syntax

    help : for a STATA command with examples

    search: to search help by keywords From command line:

    help list

    search logistic models

    net search multilevel model

    User-written programsSJ, STB, STATAlist, and others:

    help net_mnu

  • 7/29/2019 STATA Training Session 1

    9/46

    Getting Started

    Website resources: The STATA website: http://www.stata.com

    The STATA journalreviewed papers, regular columns, user-writtensoftware: http://www.stata-journal.com/

    STATA FAQ : http://www.stata.com/support/faqs STATA User Support : http://www.stata.com/support Books: http://www.stata.com/bookstore/statabooks.html

    CAC STATA support:

    Website:

    http://research2.smu.edu.sg/CAC/StatisticalComputing/Wiki/STATA.aspx Contact:

    For statistical consultation: Sun Li: [email protected]

    For software installation: TAN SuhWen: [email protected]

    http://www.stata.com/http://www.stata-journal.com/http://www.stata.com/support/faqshttp://www.stata.com/supporthttp://www.stata.com/bookstore/statabooks.htmlhttp://research2.smu.edu.sg/CAC/StatisticalComputing/Wiki/STATA.aspxmailto:[email protected]:[email protected]:[email protected]:[email protected]://research2.smu.edu.sg/CAC/StatisticalComputing/Wiki/STATA.aspxhttp://www.stata.com/bookstore/statabooks.htmlhttp://www.stata.com/supporthttp://www.stata.com/support/faqshttp://www.stata-journal.com/http://www.stata-journal.com/http://www.stata-journal.com/http://www.stata.com/
  • 7/29/2019 STATA Training Session 1

    10/46

    Running STATA Files in STATA

    Commands and Output

    STATA Variable Definitions

    Missing Values in STATA

    Expressions and Functions

    Memory Consideration

  • 7/29/2019 STATA Training Session 1

    11/46

    Running STATAFiles in STATA .dtaSTATA dataset

    STATA can read and write from/to ASCII files, such as Excel files.

    .doSTATA do-file, command file Do files can be edited and displayed by text editor, like Notepad.

    .log, .smclSTATA log file, output file Log files document the commands and analysis results displayed in Result

    Window, including error messages. Log files (.log) can be edited and displayed bytext editor.

    .gphSTATA graph file

  • 7/29/2019 STATA Training Session 1

    12/46

    Running STATACommands and Output Command prompt driven in:

    Batch mode: do-file

    Interactive mode: command line

    E.g.: verinst -- verifying version and installation of STATA

  • 7/29/2019 STATA Training Session 1

    13/46

    Running STATA To save results: log-files

    File -> Log -> Begin..., View..., or Close.

    .smclor .log extension.

    Record everything in Results Window, including commands, results, errormessages, etc.

    If the file already exists, another dialog opens to allow you to decide whether tooverwrite the file with new output or to append new output to the existing file.

    From command:

    cd // list current working directorycdD:\lsun // change working directory to be D:\lsun

    dir // list files under the current working directory

  • 7/29/2019 STATA Training Session 1

    14/46

    Running STATASTATA Variable Definitions Variable names

    1-32 characters: recommend to use 8 characters

    Valid character: letters a-z, numbers 0-9 and underscore _

    Name must be started with a letter (or underscore, but discouraged becauseSTATA-generated variables start with an underscore.)

    Case-sensitive: lowercase or uppercase letters

    Variable types

    String (Storage bytes: Str1 to Str80Str244 in SE)

    Numeric (categorical, continuous)

  • 7/29/2019 STATA Training Session 1

    15/46

    Running STATA Format of numeric variables

    Numeric formula: %w.dg; %w.df; %w.de

    w: the total width, including period and decimals

    d: number of decimals

    Format Formula E xample 2 1,000 10,000,000

    General %w.dg %9.0g 1 1000 1e+07

    Fixed %w.df %9.0f 1 1000 10000000

    %9.2f 1.41 1000.00 1.00e+07

    Exponential %w.de %10.3e 1.414e+00 1.000e+03 1.000e+07

  • 7/29/2019 STATA Training Session 1

    16/46

    Running STATAMissing Values in STATA

    Missing values are created in input or import when a numeric field is empty;or by invalid calculation, e.g. division by zero.

    System missing value is shown as a

    (period). Or a period followed by aletter, such as . a, . b, etc.

    Missing values are interpreted as a very large positive number with:

    . < . a < . b < etc

    This can lead to mistakes in logical expressions.

  • 7/29/2019 STATA Training Session 1

    17/46

    Running STATAExpressions and Functions Operators

    Arithmetic Relational Logical^ power > greater than ! not* multiplication < less than ~ not/ division >= > or equal | or+ addition

  • 7/29/2019 STATA Training Session 1

    18/46

    Running STATAMemory Consideration When your dataset is very large, you may consider to:

    Set the size of memory: set memory

    Set the maximum number of variables: set maxvar

    Set the maximum dimension of matrices: set matsize

    e.g. memory

    set memory 64m

    Parameter Default Min Max

    memory 10M 500K

    maxvar 5,000 2,047 32,766

    matsize 400 10 11,000

  • 7/29/2019 STATA Training Session 1

    19/46

    Getting started: Q & AQ1: Is there a way to stop Result Window breaking output into pages, i.e.how to get rid of thismoremessage and let Result Window roll to

    the last line of output?

    Hint: command help set to understand system parameters

    Q2: Why do I get the error message no room to add more observationseven after I reset STATA memory to load my data set?

    Hint: Two important considerations:

    1) Make sure that you allocate an amount of memory that is larger than the file thatyou are using. Stata will need the extra room to perform any commands orcalculations.

    2 Make sure that you do not allocate too much memory because your computerwill not have enough memory (RAM) left to perform other tasks.

  • 7/29/2019 STATA Training Session 1

    20/46

    Datasets in STATA Starting PointA Rectangular Matrix

    Data Input and Output

    Edit Data Properties

    Variable Management

    Data Reorganization

    Date and Time Values in STATA

  • 7/29/2019 STATA Training Session 1

    21/46

    Datasets in STATAStarting Point: A Rectangular Matrix

    321

    2232221

    1131211

    ...

    ...............

    ...

    ...

    NKNNN

    K

    K

    XXXX

    XXXX

    XXXX

    Nobservations

    K variables

  • 7/29/2019 STATA Training Session 1

    22/46

    Datasets in STATAData Input and Output

    Load STATA-format dataset:

    use [varlist] [if] [in] [using] [filename] [, clear]

    Save data in memory to file:save [filename] [, save_options]

    Clean dataset from memory:

    clear

    Note: STATA is case-sensitive.

    All STATA commands are lowercase.

    STATA allows only one dataset at one time in memory.

  • 7/29/2019 STATA Training Session 1

    23/46

    Datasets in STATA varlist : a list of variables with blanks in between.

    var1 just one variablevar1 var2 var3 three variablesvar* variables starting with var*var variables ending with var

    var1-var3 var1, var2 and var3 if : conditional language

    if mpg>40if mpg>40 & income==70if mpg>40 | mpg

  • 7/29/2019 STATA Training Session 1

    24/46

    Datasets in STATA

    Import dataset of other formats Stata can import tab-delimited ASCII text files directly.

    Excel can write tab-delimited ASCII text files

    choose FileSave AsSave as type: Text (tab delimited)

    Import text file into STATA

    Choose FileImportASCII data created by a spreadsheet

  • 7/29/2019 STATA Training Session 1

    25/46

    Datasets in STATAExample

    sysuse auto, clear //open system dataset auto.dta and clear any dataset in memory if any

    save auto, replace //save the data in memory to working directory and replace if any

    describe //describe the dataset

    browse //open data browser

    edit //open data editor

  • 7/29/2019 STATA Training Session 1

    26/46

    Datasets in STATAEdit Data Properties

    generate x=price/mpg //create new variable from algorithm

    rename x priceunit //rename variable

    label variablepriceunit "price per mpg //label variable

    listpriceunit in 1/10 //list first 10 obs for the variable priceunit

    Edit variable properties from Data Editor in edit mode

  • 7/29/2019 STATA Training Session 1

    27/46

    Data Management with STATAVariable Management

    recodeprice (10000/max=5 "10000+") ///

    (6000/10000=4 "6000-9999") (5000/6000=3 "5000-5999") ///

    (4000/5000=2 "4000-4999") (min/4000=1 "-4000") , generate (pricegrp)

    label varpricegrp price group

    dpricegrp

    codebookpricegrp

    recodepricegrp (1/2 = 1 "-5000") ///

    (3/4=2 "5000-9999") (5=3 "10000+"), generate (pricegrp2)

    codebookpricegrp pricegrp2

    save auto, replace

  • 7/29/2019 STATA Training Session 1

    28/46

    Data Management with STATAgenerate x="F //generate a new variable with value = F

    replace x="M" in 20/l //replace value of x =M from obs 20 to the last

    encode x, generate (xcode) //convert string variable x to be numeric and save it to a new var

    dx xcode

    browse x xcode

    drop x xcode

  • 7/29/2019 STATA Training Session 1

    29/46

    Data Management with STATAData Reorganization

    sortforeign //sort dataset by variable foreign

    byforeign: summarizeprice //descriptive statistics of price by foreign group

    bysortforeign: summarizeprice //alternative way

    keepprice pricegrp foreign //keep the three variables and drop the rest

    keep in 1/50 //keep the first 50 obs

    drop ifprice < 4000 //drop obs if price < 4000

    saveprice, replace //save it into a new dataset

    Note: sort only sorts inascending order. To sortdescending:

    gsortprice -mpg

  • 7/29/2019 STATA Training Session 1

    30/46

    Data Management with STATAdir

    use hsmale, clear

    codebookgender

    use hsfemale, clearcodebookgender

    append using hsmale

    codebookgender

    save hsappend,replace

    append : to combine the information from twofiles with the same variables but different obs.

  • 7/29/2019 STATA Training Session 1

    31/46

    Data Management with STATAdir

    use hsdem, clear

    sort id

    save, replace

    use hstest, clear

    sort id

    merge idusing hsdem

    listsave hsmerge

    tab_merge

    Note: Both files must be sorted beforehand by matching key (id in the example above), and thematching key must have the same name in both datasets.

    merge : To combine the information from twofiles with different information about thesame obs.

  • 7/29/2019 STATA Training Session 1

    32/46

    Date and Time Values in STATA

    sysuse sp500

    d

    list date in 1/10

    generateyear1=year(date)

    generate month1=month(date)

    ddate year1 month1

    Data Management with STATA

    How STATA records dates and times:

    Dates and times are called %t values. %t values arenumerical and integral. The integral value recordsthe number of time units that have passed from 1960.

  • 7/29/2019 STATA Training Session 1

    33/46

    Data Management with STATA

    ddate year1 month1 year2 month2

    list date year1 year2 month1 month2 in 1

    generateyear2=yofd(date)

    generate month2=mofd(date)

  • 7/29/2019 STATA Training Session 1

    34/46

    Data Management: Q & AQ1: Why does my merge produce a dataset with too many observations?

    Q2: How do I create dummy variables?

    Q3: How can I list, drop, and keep a consecutive set of variables without

    typing the names individually?

    Q4:Why does my do-file or ado-file produce different results every time Irun it?

    Q5: How do I deal with multiple responses?

    http://www.stata.com/support/faqs/data/

    http://www.stata.com/support/faqs/data/merge.htmlhttp://www.stata.com/support/faqs/data/dummy.htmlhttp://www.stata.com/support/faqs/data/varlist.htmlhttp://www.stata.com/support/faqs/data/varlist.htmlhttp://www.stata.com/support/faqs/lang/sort.htmlhttp://www.stata.com/support/faqs/lang/sort.htmlhttp://www.stata.com/support/faqs/lang/sort.htmlhttp://www.stata.com/support/faqs/data/multresp.htmlhttp://www.stata.com/support/faqs/data/http://www.stata.com/support/faqs/data/http://www.stata.com/support/faqs/data/http://www.stata.com/support/faqs/data/multresp.htmlhttp://www.stata.com/support/faqs/lang/sort.htmlhttp://www.stata.com/support/faqs/lang/sort.htmlhttp://www.stata.com/support/faqs/lang/sort.htmlhttp://www.stata.com/support/faqs/lang/sort.htmlhttp://www.stata.com/support/faqs/lang/sort.htmlhttp://www.stata.com/support/faqs/lang/sort.htmlhttp://www.stata.com/support/faqs/data/varlist.htmlhttp://www.stata.com/support/faqs/data/varlist.htmlhttp://www.stata.com/support/faqs/data/dummy.htmlhttp://www.stata.com/support/faqs/data/dummy.htmlhttp://www.stata.com/support/faqs/data/merge.html
  • 7/29/2019 STATA Training Session 1

    35/46

    Exercise 1

  • 7/29/2019 STATA Training Session 1

    36/46

    Brief Introduction of Graphics

    Plot area

    10

    20

    30

    40

    2,000 3,000 4,000 5,000

    X-axis title

    first legend second legend

    Legend

    Note: This is the outer region or background

    Subtitle: The anatomy of a graph

    Title: Figure 1

  • 7/29/2019 STATA Training Session 1

    37/46

    Brief Introduction of Graphicssysuse auto, clear

    twoway(scattermpg weight ifforeign==0, msymbol(diamond) mcolor(green))

    (scattermpg weight ifforeign==1, msymbol(diamond) mcolor(red)),

    title(Title: Figure 1) subtitle(Subtitle: The anatomy of a graph)

    ytitle(Y-axis title) xtitle(X-axis title)

    note(Note: This is the outer region or background)

    legend(title(Legend) label(1 first legend) label(2 second legend))

    text(35 3400 "Plot area")

  • 7/29/2019 STATA Training Session 1

    38/46

    Data Description & Simple GraphsDescribing Datasets

    use auto, clear

    describe

  • 7/29/2019 STATA Training Session 1

    39/46

    Data Description & Simple GraphsDescribing Variables

    codebook

    summarizeprice mpg weight length

    summarizeprice mpg, detail

    bysortforeign: summarizeprice mpg

    Command summarize: providesdescriptive statistics with option fordetails.

  • 7/29/2019 STATA Training Session 1

    40/46

    Data Description & Simple Graphshist weight,frequency normal

    hist weight,frequency normal start(750) width(250) label(1000(500)5000)

    0

    5

    10

    15

    2,000 3,000 4,000 5,000Weight (lbs.)

    0

    5

    10

    15

    1,000 1,500 2,000 2,500 3,000 3,500 4,000 4,500 5,000Weight (lbs.)

  • 7/29/2019 STATA Training Session 1

    41/46

    Data Description & Simple Graphsgraph box mpggraph box mpg,over(foreign)

    graph bar(mean) mpg trunk,over(pricegrp) ///

    legend(label(1 "mpg") label(2 "trunk")) ///

    blabel(bar, position(inside) format(%9.1f) color(white))

    10

    20

    30

    40

    Domestic Foreign

    26.2

    10.5

    22.5

    12.8

    20.6

    15.4

    20.4

    14.5 15.0

    16.6

    0

    5

    10

    15

    20

    25

    -4000 4000-4999 5000-5999 6000-9999 10000+

    mpg trunk

  • 7/29/2019 STATA Training Session 1

    42/46

    Data Description & Simple GraphsTabulating Data

    tab1foreign pricegrp

    tab2foreign pricegrp

    tab2foreign pricegrp, row column

    Command tab1: provides one-wayfrequency table.

    Command tab2: providescontingency table.

  • 7/29/2019 STATA Training Session 1

    43/46

    Data Description & Simple Graphstab1foreign, summarize(price)

    tabstatprice mpg, by(foreign)

    tabstatprice mpg, stat(n mean sd p25 p50 p75) by(foreign)

    Command tab1: tabulates descriptivestatistics for continuous variables.

    Command tabstat: displays table ofsummary statistics.

  • 7/29/2019 STATA Training Session 1

    44/46

    Data Description & Simple Graphsgraph pie, over(pricegrp)graph pie, over(pricegrp)plabel(_all percent, color(white)) by(foreign)

    13.46%

    42.31%17.31%

    11.54%

    15.38% 18.18%

    18.18%

    22.73%

    31.82%

    9.091%

    Domestic Foreign

    -4000 4000-4999

    5000-5999 6000-9999

    10000+

    Graphs by Car type

    -4000 4000-4999

    5000-5999 6000-9999

    10000+

  • 7/29/2019 STATA Training Session 1

    45/46

    Graphics: Q & Ahttp://www.stata.com/support/faqs/graphics/

    Exercise 2

    http://www.stata.com/support/faqs/graphics/http://www.stata.com/support/faqs/graphics/http://www.stata.com/support/faqs/graphics/
  • 7/29/2019 STATA Training Session 1

    46/46

    Next SessionStatistical Analysis

    17 Oct Friday, 9.30am-12pm

    Training Room @ Library Level 5

    Data Description And Simple Inference

    Group Comparison And Correlation

    General Linear Regression

    Logistic Model

    Binary Logistic Model

    Ordinal Logistic Model

    Multinomial Logistic Model