r data types
Post on 29-Sep-2015
8 Views
Preview:
DESCRIPTION
TRANSCRIPT
-
2.BasicDataTypesContents
VariableTypesTables
WelookatsomeofthewaysthatRcanstoreandorganizedata.ThisisabasicintroductiontoasmallsubsetofthedifferentdatatypesrecognizedbyRandisnotcomprehensiveinanysense.ThemaingoalistodemonstratethedifferentkindsofinformationRcanhandle.Itisassumedthatyouknowhowtoenterdataorreaddatafileswhichiscoveredinthefirstchapter.
2.1.VariableTypes
2.1.1.Numbers
Thewaytoworkwithrealnumbershasalreadybeencoveredinthefirstchapterandisbrieflydiscussedhere.Themostbasicwaytostoreanumberistomakeanassignmentofasinglenumber:
>a
Thea[1]3
Thisallowsyoutodoallsortsofbasicoperationsandsavethenumbers:
>bb[1]3.464102
Ifyouwanttogetalistofthevariablesthatyouhavedefinedinaparticularsessionyoucanlistthemallusingthelscommand:
-
>ls()[1]"a""b"
Youarenotlimitedtojustsavingasinglenumber.Youcancreatealist(alsocalledavector)usingtheccommand:
>aa[1]12345>a+1[1]23456>mean(a)[1]3>var(a)[1]2.5
Youcangetaccesstoparticularentriesinthevectorinthefollowingmanner:
>aa[1][1]1>a[2][1]2>a[0]numeric(0)>a[5][1]5>a[6][1]NA
Notethatthezeroentryisusedtoindicatehowthedataisstored.Thefirstentryinthevectoristhefirstnumber,andifyoutrytogetanumberpastthelastnumberyougetNA.
Examplesofthesortofoperationsyoucandoonvectorsisgiveninanextchapter.
Toinitializealistofnumbersthenumericcommandcanbeused.Forexample,tocreatealistof10numbers,initializedtozero,usethefollowingcommand:
>aa[1]0000000000
Ifyouwishtodeterminethedatatypeusedforavariablethetypecommand:
>typeof(a)[1]"double"
2.1.2.Strings
-
Youarenotlimitedtojuststoringnumbers.Youcanalsostorestrings.Astringisspecifiedbyusingquotes.Bothsingleanddoublequoteswillwork:
>aa[1]"hello">bb[1]"hello""there">b[1][1]"hello"
Thenameofthetypegiventostringsischaracter,
>typeof(a)[1]"character">a=character(20)>a[1]""""""""""""""""""""""""""""""""""""""""
2.1.3.Factors
AnotherimportantwayRcanstoredataisasafactor.Oftentimesanexperimentincludestrialsfordifferentlevelsofsomeexplanatoryvariable.Forexample,whenlookingattheimpactofcarbondioxideonthegrowthrateofatreeyoumighttrytoobservehowdifferenttreesgrowwhenexposedtodifferentpresetconcentrationsofcarbondioxide.Thedifferentlevelsarealsocalledfactors.
Assumingyouknowhowtoreadinafile,wewilllookatthedatafilegiveninthefirstchapter.Severalofthevariablesinthefilearefactors:
>summary(tree$CHBR)A1A2A3A4A5A6A7B1B2B3B4B5B6B7C1C2C3C4C5C631131311333333131311C7CL6CL7D1D2D3D4D5D6D71111131111
BecausethesetofoptionsgiveninthedatafilecorrespondingtotheCHBRcolumnarenotallnumbersRautomaticallyassumesthatitisafactor.Whenyouusesummaryonafactoritdoesnotprintoutthefivepointsummary,ratheritprintsoutthepossiblevaluesandthefrequencythattheyoccur.
Inthisdatasetseveralofthecolumnsarefactors,buttheresearchersusednumberstoindicatethedifferentlevels.Forexample,thefirstcolumn,labeledC,isafactor.Eachtreeswasgrowninanenvironmentwithoneoffourdifferentpossiblelevelsofcarbondioxide.Theresearchersquitesensiblylabeledthesefourenvironmentsas1,2,3,and4.Unfortunately,R
-
cannotdeterminethatthesearefactorsandmustassumethattheyareregularnumbers.
ThisisacommonproblemandthereisawaytotellRtotreattheCcolumnasasetoffactors.Youspecifythatavariableisafactorusingthefactorcommand.Inthefollowingexampleweconverttree$Cintoafactor:
>tree$C[1]11111111222222222222222222222223333333[39]3334444444444444>summary(tree$C)Min.1stQu.MedianMean3rdQu.Max.1.0002.0002.0002.5193.0004.000>tree$Ctree$C[1]11111111222222222222222222222223333333[39]3334444444444444Levels:1234>summary(tree$C)12348231013>levels(tree$C)[1]"1""2""3""4"
OnceavectorisconvertedintoasetoffactorsthenRtreatsitdifferently.Asetoffactorshaveadiscretesetofpossiblevalues,anditdoesnotmakesensetotrytofindaveragesorothernumericaldescriptions.Onethingthatisimportantisthenumberoftimesthateachfactorappears,calledtheirfrequencies,whichisprintedusingthesummarycommand.
2.1.4.DataFrames
Anotherwaythatinformationisstoredisindataframes.Thisisawaytotakemanyvectorsofdifferenttypesandstoretheminthesamevariable.Thevectorscanbeofalldifferenttypes.Forexample,adataframemaycontainmanylists,andeachlistmightbealistoffactors,strings,ornumbers.
Therearedifferentwaystocreateandmanipulatedataframes.Mostarebeyondthescopeofthisintroduction.Theyareonlymentionedheretoofferamorecompletedescription.Pleaseseethefirstchapterformoreinformationondataframes.
Oneexampleofhowtocreateadataframeisgivenbelow:
>ablevelsbubbabubbafirstsecondf112A224B
-
336A448B>summary(bubba)firstsecondfMin.:1.00Min.:2.0A:21stQu.:1.751stQu.:3.5B:2Median:2.50Median:5.0Mean:2.50Mean:5.03rdQu.:3.253rdQu.:6.5Max.:4.00Max.:8.0>bubba$first[1]1234>bubba$second[1]2468>bubba$f[1]ABABLevels:AB
2.1.5.Logical
Anotherimportantdatatypeisthelogicaltype.Therearetwopredefinedvariables,TRUEandFALSE:
>a=TRUE>typeof(a)[1]"logical">b=FALSE>typeof(b)[1]"logical"
Thestandardlogicaloperatorscanbeused:
< lessthan> greatthan= greaterthanor
equal== equalto!= notequalto| entrywiseor|| or! not& entrywiseand&& andxor(a,b) exclusiveor
Notethatthereisadifferencebetweenoperatorsthatactonentrieswithinavectorandthewholevector:
>a=c(TRUE,FALSE)>b=c(FALSE,FALSE)>a|b
-
[1]TRUEFALSE>a||b[1]TRUE>xor(a,b)[1]TRUEFALSE
Therearealargenumberoffunctionsthattesttodeterminethetypeofavariable.Forexampletheis.numericfunctioncandetermineifavariableisnumeric:
>a=c(1,2,3)>is.numeric(a)[1]TRUE>is.factor(a)[1]FALSE
2.2.Tables
Anothercommonwaytostoreinformationisinatable.Herewelookathowtodefinebothonewayandtwowaytables.Weonlylookathowtocreateanddefinetablesthefunctionsusedintheanalysisofproportionsareexaminedinanotherchapter.
2.2.1.OneWayTables
Thefirstexampleisforaonewaytable.Onewaytablesarenotthemostinterestingexample,butitisagoodplacetostart.Onewaytocreateatableisusingthetablecommand.Theargumentsittakesisavectoroffactors,anditcalculatesthefrequencythateachfactoroccurs.Hereisanexampleofhowtocreateaonewaytable:
>aresultsresultsaABC432>attributes(results)$dim[1]3
$dimnames$dimnames$a[1]"A""B""C"
$class[1]"table"
>summary(results)Numberofcasesintable:9Numberoffactors:1
-
Ifyouknowthenumberofoccurrencesforeachfactorthenitispossibletocreatethetabledirectly,buttheprocessis,unfortunately,abitmoreconvoluted.Thereisaneasierwaytodefineonewaytables(atablewithonerow),butitdoesnotextendeasilytotwowaytables(tableswithmorethanonerow).Youmustfirstcreateamatrixofnumbers.Amatrixislikeavectorinthatitisalistofnumbers,butitisdifferentinthatyoucanhavebothrowsandcolumnsofnumbers.Forexample,inourexampleabovethenumberofoccurrencesofAis4,thenumberofoccurrencesofBis3,andthenumberofoccurrencesofCis2.Wewillcreateonerowofnumbers.Thefirstcolumncontainsa4,thesecondcolumncontainsa3,andthethirdcolumncontainsa2:
>occuroccur[,1][,2][,3][1,]432
Atthispointthevariableoccurisamatrixwithonerowandthreecolumnsofnumbers.Todressitupanduseitasatablewewouldliketogiveitlabelsforeachcolumnsjustlikeinthepreviousexample.Oncethatisdoneweconvertthematrixtoatableusingtheas.tablecommand:
>colnames(occur)occurABC[1,]432>occuroccurABCA432>attributes(occur)$dim[1]13
$dimnames$dimnames[[1]][1]"A"
$dimnames[[2]][1]"A""B""C"
$class[1]"table"
2.2.2.TwoWayTables
Ifyouwanttoaddrowstoyourtablejustaddanothervectortotheargumentofthetablecommand.Intheexamplebelowwehavetwoquestions.InthefirstquestiontheresponsesarelabeledNever,Sometimes,orAlways.InthesecondquestiontheresponsesarelabeledYes,No,orMaybe.Thesetofvectorsa,andb,containtheresponseforeach
-
measurement.Thethirditeminaishowthethirdpersonrespondedtothefirstquestion,andthethirditeminbishowthethirdpersonrespondedtothesecondquestion.
>abresultsresultsbaMaybeNoYesAlways200Never011Sometimes211
Thetablecommandallowsustodoaveryquickcalculation,andwecanimmediatelyseethattwopeoplewhosaidMaybetothefirstquestionalsosaidSometimestothesecondquestion.
Justasinthecasewithonewaytablesitispossibletomanuallyentertwowaytables.Theprocedureisexactlythesameasaboveexceptthatwenowhavemorethanonerow.Wegiveabriefexamplebelowtodemonstratehowtoenteratwowaytablethatincludesbreakdownofagroupofpeoplebyboththeirgenderandwhetherornottheysmoke.YouenterallofthedataasonelonglistbuttellRtobreakitupintosomenumberofcolumns:
>sexsmokerownames(sexsmoke)colnames(sexsmoke)sexsmokesexsmokesmokenosmokemale70120female65140
Thematrixcommandcreatesatwobytwomatrix.Thebyrow=TRUEoptionindicatesthatthenumbersarefilledinacrosstherowsfirst,andthencols=2indicatesthattherearetwocolumns.
RTutorialbyKellyBlackislicensedunderaCreativeCommonsAttributionNonCommercial4.0InternationalLicense.Basedonaworkathttp://www.cyclismo.org/tutorial/R/.
ThispagegeneratedusingSphinx.
top related