Download - R_Input
-
1.InputContents
AssignmentReadingaCSVfileBriefNoteonFixedWidthFiles
HereweexplorehowtodefineadatasetinanRsession.Onlytwocommandsareexplored.Thefirstisforsimpleassignmentofdata,andthesecondisforreadinginadatafile.TherearemanywaystoreaddataintoanRsession,butwefocusonjusttwotokeepitsimple.
1.1.Assignment
Themoststraightforwardwaytostorealistofnumbersisthroughanassignmentusingtheccommand.(cstandsforcombine.)Theideaisthatalistofnumbersisstoredunderagivenname,andthenameisusedtorefertothedata.Alistisspecifiedwiththeccommand,andassignmentisspecifiedwiththebubba
Whenyouenterthiscommandyoushouldnotseeanyoutputexceptanewcommandline.Thecommandcreatesalistofnumberscalledbubba.Toseewhatnumbersisincludedinbubbatypebubbaandpresstheenterkey:
>bubba[1]3579
Ifyouwishtoworkwithoneofthenumbersyoucangetaccesstoitusingthevariableandthensquarebracketsindicatingwhichnumber:
>bubba[2][1]5>bubba[1][1]3>bubba[0]numeric(0)
-
>bubba[3][1]7>bubba[4][1]9
Noticethatthefirstentryisreferredtoasthenumber1entry,andthezeroentrycanbeusedtoindicatehowthecomputerwilltreatthedata.Youcanstorestringsusingbothsingleanddoublequotes,andyoucanstorerealnumbers.
Younowhavealistofnumbersandarereadytoexplore.InthechaptersthatfollowwewillexaminethebasicoperationsinRthatwillallowyoutodosomeoftheanalysesrequiredinclass.
1.2.ReadingaCSVfile
Unfortunately,itisraretohavejustafewdatapointsthatyoudonotmindtypinginattheprompt.Itismuchmorecommontohavealotofdatapointswithcomplicatedrelationships.Herewewillexaminehowtoreadadatasetfromafileusingtheread.csvfunctionbutfirstdiscusstheformatofadatafile.
Weassumethatthedatafileisintheformatcalledcommaseparatedvalues(csv).Thatis,eachlinecontainsarowofvalueswhichcanbenumbersorletters,andeachvalueisseparatedbyacomma.Wealsoassumethattheveryfirstrowcontainsalistoflabels.Theideaisthatthelabelsinthetoprowareusedtorefertothedifferentcolumnsofvalues.
Firstwereadaveryshort,somewhatsilly,datafile.Thedatafileiscalledsimple.csvandhasthreecolumnsofdataandsixrows.Thethreecolumnsarelabeledtrial,mass,andvelocity.WecanpretendthateachrowcomesfromanobservationduringoneoftwotrialslabeledAandB.AcopyofthedatafileisshownbelowandiscreatedindefianceofWernerHeisenberg:
silly.csvtrial mass velocityA 10 12A 11 14B 5 8B 6 10A 10.5 13
B 7 11
Thecommandtoreadthedatafileisread.csv.Wehavetogivethecommandatleastonearguments,butwewillgivethreedifferentargumentstoindicatehowthecommandcanbeusedindifferentsituations.Thefirstargumentisthenameoffile.Thesecondargumentindicateswhetherornotthefirstrowisasetoflabels.Thethirdargumentindicatesthatthereisacommabetweeneachnumberofeachline.Thefollowingcommandwillreadinthedataandassignittoavariablecalledheisenberg:
-
>heisenbergheisenbergtrialmassvelocity1A10.0122A11.0143B5.084B6.0105A10.5136B7.011>summary(heisenberg)trialmassvelocityA:3Min.:5.00Min.:8.00B:31stBu.:6.251stQu.:10.25Median:8.50Median:11.50Mean:8.25Mean:11.333rdQu.:10.383rdQu.:12.75Max.:11.00Max.:14.00
(NotethatifyouareusingaMicrosoftsystemthefilenamingconventionisdifferentfromwhatweusehere.Ifyouwanttouseabackslashitneedstobeescaped,i.e.usetwobackslashestogether\.AlsoyoucanspecifywhatfoldertousebyclickingontheFileoptioninthemainmenuandchoosetheoptiontospecifyyourworkingdirectory.)
Togetmoreinformationonthedifferentoptionsavailableyoucanusethehelpcommand:
>help(read.csv)
IfRisnotfindingthefileyouaretryingtoreadthenitmaybelookinginthewrongfolder/directory.Ifyouareusingthegraphicalinterfaceyoucanchangetheworkingdirectoryfromthefilemenu.Ifyouarenotsurewhatfilesareinthecurrentworkingdirectoryyoucanusethedir()commandtolistthefilesandthegetwd()commandtodeterminethecurrentworkingdirectory:
>dir()[1]"fixedWidth.dat""simple.csv""trees91.csv""trees91.wk1"[5]"w1.dat">getwd()[1]"/home/black/write/class/stat/stat38313F/dat"
Thevariableheisenbergcontainsthethreecolumnsofdata.Eachcolumnisassignedanamebasedontheheader(thefirstlineinthefile).Youcannowaccesseachindividualcolumnusinga$toseparatethetwonames:
>heisenberg$trial[1]AABBABLevels:AB>heisenberg$mass[1]10.011.05.06.010.57.0>heisenberg$velocity[1]12148101311
-
Ifyouarenotsurewhatcolumnsarecontainedinthevariableyoucanusethenamescommand:
>names(heisenberg)[1]"trial""mass""velocity"
Wewilllookatanotherexamplewhichisusedthroughoutthistutorial.wewilllookatthedatafoundinaspreadsheetlocatedathttp://cdiac.ornl.gov/ftp/ndp061a/trees91.wk1.Adescriptionofthedatafileislocatedathttp://cdiac.ornl.gov/ftp/ndp061a/ndp061a.txt.Theoriginaldataisgiveninanexcelspreadsheet.Ithasbeenconvertedintoacsvfile,trees91.csv,bydeletingthetopsetofrowsandsavingitasacsvfile.Thisisanoptiontosavewithinexcel.(Youshouldsavethefileonyourcomputer.)Itisagoodideatoopenthisfileinaspreadsheetandlookatit.ThiswillhelpyoumakesenseofhowRstoresthedata.
ThedataisusedtoindicateanestimateofbiomassofponderosapineinastudyperformedbyDaleW.Johnson,J.TimothyBall,andRogerF.WalkerwhoareassociatedwiththeBiologicalSciencesCenter,DesertResearchInstitute,P.O.Box60220,Reno,NV89506andtheEnvironmentalandResourceSciencesCollegeofAgriculture,UniversityofNevada,Reno,NV89512.Thedataisconsistsof54lines,andeachlinerepresentsanobservation.Eachobservationincludesmeasurementsandmarkersfor28differentmeasurementsofagiventree.Forexample,thefirstnumberineachrowisanumber,either1,2,3,or4,whichsignifiesadifferentlevelofexposuretocarbondioxide.Thesixthnumberineveryrowisanestimateofthebiomassofthestemsofatree.Notethattheveryfirstlineinthefileisalistoflabelsusedforthedifferentcolumnsofdata.
Thedatacanbereadintoavariablecalledtreeinusingtheread.csvcommand:
>treeattributes(tree)$names[1]"C""N""CHBR""REP""LFBM""STBM""RTBM""LFNCC"[9]"STNCC""RTNCC""LFBCC""STBCC""RTBCC""LFCACC""STCACC""RTCACC"[17]"LFKCC""STKCC""RTKCC""LFMGCC""STMGCC""RTMGCC""LFPCC""STPCC"
-
[25]"RTPCC""LFSCC""STSCC""RTSCC"
$class[1]"data.frame"
$row.names[1]"1""2""3""4""5""6""7""8""9""10""11""12""13""14""15"[16]"16""17""18""19""20""21""22""23""24""25""26""27""28""29""30"[31]"31""32""33""34""35""36""37""38""39""40""41""42""43""44""45"[46]"46""47""48""49""50""51""52""53""54"
ThefirstthingthatRstoresisalistofnameswhichrefertoeachcolumnofthedata.Forexample,thefirstcolumniscalledC,thesecondcolumniscalledN.Treeisoftypedata.frame.Finally,therowsarenumberedconsecutivelyfrom1to54.Eachcolumnhas54numbersinit.
Ifyouknowthatavariableisadataframebutarenotsurewhatlabelsareusedtorefertothedifferentcolumnsyoucanusethenamescommand:
>names(tree)[1]"C""N""CHBR""REP""LFBM""STBM""RTBM""LFNCC"[9]"STNCC""RTNCC""LFBCC""STBCC""RTBCC""LFCACC""STCACC""RTCACC"[17]"LFKCC""STKCC""RTKCC""LFMGCC""STMGCC""RTMGCC""LFPCC""STPCC"[25]"RTPCC""LFSCC""STSCC""RTSCC"
Ifyouwanttoworkwiththedatainoneofthecolumnsyougivethenameofthedataframe,a$sign,andthelabelassignedtothecolumn.Forexample,thefirstcolumnintreecanbecalledusingtree$C:
>tree$C[1]11111111222222222222222222222223333333[39]3334444444444444
1.3.BriefNoteonFixedWidthFiles
TherearemanywaystoreaddatausingR.Weonlygivetwoexamples,directassignmentandreadingcsvfiles.However,anotherwaydeservesabriefmention.Itiscommontocomeacrossdatathatisorganizedinflatfilesanddelimitedatpresetlocationsoneachline.Thisisoftencalledafixedwidthfile.
Thecommandtodealwiththesekindoffilesisread.fwf.Examplesofhowtousethiscommandarenotexploredhere,butabriefexampleisgiven.Ifyouwouldlikemoreinformationonhowtousethiscommandenterthefollowingcommand:
>help(read.fwf)
Theread.fwfcommandrequiresatleasttwooptions.Thefirstisthenameofthefileandthe
-
secondisalistofnumbersthatgivesthelengthofeachcolumninthedatafile.Anegativenumberinthelistindicatesthatthecolumnshouldbeskipped.HerewegivethecommandtoreadthedatafilefixedWidth.dat.Inthisdatafiletherearethreecolumns.Thefirstcolumis17characterswide,thesecondcolumnis15characterswide,andthelastcolumnis7characterswide.Intheexamplebelowweusetheoptionalcol.namesoptiontospecifythenamesofthecolumns:
>a=read.fwf('fixedWidth.dat',widths=c(17,15,7),col.names=c('temp','offices'))>atempoffices117.035218.0117317.519417.528
RTutorialbyKellyBlackislicensedunderaCreativeCommonsAttributionNonCommercial4.0InternationalLicense.Basedonaworkathttp://www.cyclismo.org/tutorial/R/.
ThispagegeneratedusingSphinx.