r data types

8

Click here to load reader

Upload: nascem

Post on 29-Sep-2015

8 views

Category:

Documents


2 download

DESCRIPTION

R Basic Data Types

TRANSCRIPT

  • 2.BasicDataTypesContents

    VariableTypesTables

    WelookatsomeofthewaysthatRcanstoreandorganizedata.ThisisabasicintroductiontoasmallsubsetofthedifferentdatatypesrecognizedbyRandisnotcomprehensiveinanysense.ThemaingoalistodemonstratethedifferentkindsofinformationRcanhandle.Itisassumedthatyouknowhowtoenterdataorreaddatafileswhichiscoveredinthefirstchapter.

    2.1.VariableTypes

    2.1.1.Numbers

    Thewaytoworkwithrealnumbershasalreadybeencoveredinthefirstchapterandisbrieflydiscussedhere.Themostbasicwaytostoreanumberistomakeanassignmentofasinglenumber:

    >a

    Thea[1]3

    Thisallowsyoutodoallsortsofbasicoperationsandsavethenumbers:

    >bb[1]3.464102

    Ifyouwanttogetalistofthevariablesthatyouhavedefinedinaparticularsessionyoucanlistthemallusingthelscommand:

  • >ls()[1]"a""b"

    Youarenotlimitedtojustsavingasinglenumber.Youcancreatealist(alsocalledavector)usingtheccommand:

    >aa[1]12345>a+1[1]23456>mean(a)[1]3>var(a)[1]2.5

    Youcangetaccesstoparticularentriesinthevectorinthefollowingmanner:

    >aa[1][1]1>a[2][1]2>a[0]numeric(0)>a[5][1]5>a[6][1]NA

    Notethatthezeroentryisusedtoindicatehowthedataisstored.Thefirstentryinthevectoristhefirstnumber,andifyoutrytogetanumberpastthelastnumberyougetNA.

    Examplesofthesortofoperationsyoucandoonvectorsisgiveninanextchapter.

    Toinitializealistofnumbersthenumericcommandcanbeused.Forexample,tocreatealistof10numbers,initializedtozero,usethefollowingcommand:

    >aa[1]0000000000

    Ifyouwishtodeterminethedatatypeusedforavariablethetypecommand:

    >typeof(a)[1]"double"

    2.1.2.Strings

  • Youarenotlimitedtojuststoringnumbers.Youcanalsostorestrings.Astringisspecifiedbyusingquotes.Bothsingleanddoublequoteswillwork:

    >aa[1]"hello">bb[1]"hello""there">b[1][1]"hello"

    Thenameofthetypegiventostringsischaracter,

    >typeof(a)[1]"character">a=character(20)>a[1]""""""""""""""""""""""""""""""""""""""""

    2.1.3.Factors

    AnotherimportantwayRcanstoredataisasafactor.Oftentimesanexperimentincludestrialsfordifferentlevelsofsomeexplanatoryvariable.Forexample,whenlookingattheimpactofcarbondioxideonthegrowthrateofatreeyoumighttrytoobservehowdifferenttreesgrowwhenexposedtodifferentpresetconcentrationsofcarbondioxide.Thedifferentlevelsarealsocalledfactors.

    Assumingyouknowhowtoreadinafile,wewilllookatthedatafilegiveninthefirstchapter.Severalofthevariablesinthefilearefactors:

    >summary(tree$CHBR)A1A2A3A4A5A6A7B1B2B3B4B5B6B7C1C2C3C4C5C631131311333333131311C7CL6CL7D1D2D3D4D5D6D71111131111

    BecausethesetofoptionsgiveninthedatafilecorrespondingtotheCHBRcolumnarenotallnumbersRautomaticallyassumesthatitisafactor.Whenyouusesummaryonafactoritdoesnotprintoutthefivepointsummary,ratheritprintsoutthepossiblevaluesandthefrequencythattheyoccur.

    Inthisdatasetseveralofthecolumnsarefactors,buttheresearchersusednumberstoindicatethedifferentlevels.Forexample,thefirstcolumn,labeledC,isafactor.Eachtreeswasgrowninanenvironmentwithoneoffourdifferentpossiblelevelsofcarbondioxide.Theresearchersquitesensiblylabeledthesefourenvironmentsas1,2,3,and4.Unfortunately,R

  • cannotdeterminethatthesearefactorsandmustassumethattheyareregularnumbers.

    ThisisacommonproblemandthereisawaytotellRtotreattheCcolumnasasetoffactors.Youspecifythatavariableisafactorusingthefactorcommand.Inthefollowingexampleweconverttree$Cintoafactor:

    >tree$C[1]11111111222222222222222222222223333333[39]3334444444444444>summary(tree$C)Min.1stQu.MedianMean3rdQu.Max.1.0002.0002.0002.5193.0004.000>tree$Ctree$C[1]11111111222222222222222222222223333333[39]3334444444444444Levels:1234>summary(tree$C)12348231013>levels(tree$C)[1]"1""2""3""4"

    OnceavectorisconvertedintoasetoffactorsthenRtreatsitdifferently.Asetoffactorshaveadiscretesetofpossiblevalues,anditdoesnotmakesensetotrytofindaveragesorothernumericaldescriptions.Onethingthatisimportantisthenumberoftimesthateachfactorappears,calledtheirfrequencies,whichisprintedusingthesummarycommand.

    2.1.4.DataFrames

    Anotherwaythatinformationisstoredisindataframes.Thisisawaytotakemanyvectorsofdifferenttypesandstoretheminthesamevariable.Thevectorscanbeofalldifferenttypes.Forexample,adataframemaycontainmanylists,andeachlistmightbealistoffactors,strings,ornumbers.

    Therearedifferentwaystocreateandmanipulatedataframes.Mostarebeyondthescopeofthisintroduction.Theyareonlymentionedheretoofferamorecompletedescription.Pleaseseethefirstchapterformoreinformationondataframes.

    Oneexampleofhowtocreateadataframeisgivenbelow:

    >ablevelsbubbabubbafirstsecondf112A224B

  • 336A448B>summary(bubba)firstsecondfMin.:1.00Min.:2.0A:21stQu.:1.751stQu.:3.5B:2Median:2.50Median:5.0Mean:2.50Mean:5.03rdQu.:3.253rdQu.:6.5Max.:4.00Max.:8.0>bubba$first[1]1234>bubba$second[1]2468>bubba$f[1]ABABLevels:AB

    2.1.5.Logical

    Anotherimportantdatatypeisthelogicaltype.Therearetwopredefinedvariables,TRUEandFALSE:

    >a=TRUE>typeof(a)[1]"logical">b=FALSE>typeof(b)[1]"logical"

    Thestandardlogicaloperatorscanbeused:

    < lessthan> greatthan= greaterthanor

    equal== equalto!= notequalto| entrywiseor|| or! not& entrywiseand&& andxor(a,b) exclusiveor

    Notethatthereisadifferencebetweenoperatorsthatactonentrieswithinavectorandthewholevector:

    >a=c(TRUE,FALSE)>b=c(FALSE,FALSE)>a|b

  • [1]TRUEFALSE>a||b[1]TRUE>xor(a,b)[1]TRUEFALSE

    Therearealargenumberoffunctionsthattesttodeterminethetypeofavariable.Forexampletheis.numericfunctioncandetermineifavariableisnumeric:

    >a=c(1,2,3)>is.numeric(a)[1]TRUE>is.factor(a)[1]FALSE

    2.2.Tables

    Anothercommonwaytostoreinformationisinatable.Herewelookathowtodefinebothonewayandtwowaytables.Weonlylookathowtocreateanddefinetablesthefunctionsusedintheanalysisofproportionsareexaminedinanotherchapter.

    2.2.1.OneWayTables

    Thefirstexampleisforaonewaytable.Onewaytablesarenotthemostinterestingexample,butitisagoodplacetostart.Onewaytocreateatableisusingthetablecommand.Theargumentsittakesisavectoroffactors,anditcalculatesthefrequencythateachfactoroccurs.Hereisanexampleofhowtocreateaonewaytable:

    >aresultsresultsaABC432>attributes(results)$dim[1]3

    $dimnames$dimnames$a[1]"A""B""C"

    $class[1]"table"

    >summary(results)Numberofcasesintable:9Numberoffactors:1

  • Ifyouknowthenumberofoccurrencesforeachfactorthenitispossibletocreatethetabledirectly,buttheprocessis,unfortunately,abitmoreconvoluted.Thereisaneasierwaytodefineonewaytables(atablewithonerow),butitdoesnotextendeasilytotwowaytables(tableswithmorethanonerow).Youmustfirstcreateamatrixofnumbers.Amatrixislikeavectorinthatitisalistofnumbers,butitisdifferentinthatyoucanhavebothrowsandcolumnsofnumbers.Forexample,inourexampleabovethenumberofoccurrencesofAis4,thenumberofoccurrencesofBis3,andthenumberofoccurrencesofCis2.Wewillcreateonerowofnumbers.Thefirstcolumncontainsa4,thesecondcolumncontainsa3,andthethirdcolumncontainsa2:

    >occuroccur[,1][,2][,3][1,]432

    Atthispointthevariableoccurisamatrixwithonerowandthreecolumnsofnumbers.Todressitupanduseitasatablewewouldliketogiveitlabelsforeachcolumnsjustlikeinthepreviousexample.Oncethatisdoneweconvertthematrixtoatableusingtheas.tablecommand:

    >colnames(occur)occurABC[1,]432>occuroccurABCA432>attributes(occur)$dim[1]13

    $dimnames$dimnames[[1]][1]"A"

    $dimnames[[2]][1]"A""B""C"

    $class[1]"table"

    2.2.2.TwoWayTables

    Ifyouwanttoaddrowstoyourtablejustaddanothervectortotheargumentofthetablecommand.Intheexamplebelowwehavetwoquestions.InthefirstquestiontheresponsesarelabeledNever,Sometimes,orAlways.InthesecondquestiontheresponsesarelabeledYes,No,orMaybe.Thesetofvectorsa,andb,containtheresponseforeach

  • measurement.Thethirditeminaishowthethirdpersonrespondedtothefirstquestion,andthethirditeminbishowthethirdpersonrespondedtothesecondquestion.

    >abresultsresultsbaMaybeNoYesAlways200Never011Sometimes211

    Thetablecommandallowsustodoaveryquickcalculation,andwecanimmediatelyseethattwopeoplewhosaidMaybetothefirstquestionalsosaidSometimestothesecondquestion.

    Justasinthecasewithonewaytablesitispossibletomanuallyentertwowaytables.Theprocedureisexactlythesameasaboveexceptthatwenowhavemorethanonerow.Wegiveabriefexamplebelowtodemonstratehowtoenteratwowaytablethatincludesbreakdownofagroupofpeoplebyboththeirgenderandwhetherornottheysmoke.YouenterallofthedataasonelonglistbuttellRtobreakitupintosomenumberofcolumns:

    >sexsmokerownames(sexsmoke)colnames(sexsmoke)sexsmokesexsmokesmokenosmokemale70120female65140

    Thematrixcommandcreatesatwobytwomatrix.Thebyrow=TRUEoptionindicatesthatthenumbersarefilledinacrosstherowsfirst,andthencols=2indicatesthattherearetwocolumns.

    RTutorialbyKellyBlackislicensedunderaCreativeCommonsAttributionNonCommercial4.0InternationalLicense.Basedonaworkathttp://www.cyclismo.org/tutorial/R/.

    ThispagegeneratedusingSphinx.