Download - R_Input

Transcript
  • 1.InputContents

    AssignmentReadingaCSVfileBriefNoteonFixedWidthFiles

    HereweexplorehowtodefineadatasetinanRsession.Onlytwocommandsareexplored.Thefirstisforsimpleassignmentofdata,andthesecondisforreadinginadatafile.TherearemanywaystoreaddataintoanRsession,butwefocusonjusttwotokeepitsimple.

    1.1.Assignment

    Themoststraightforwardwaytostorealistofnumbersisthroughanassignmentusingtheccommand.(cstandsforcombine.)Theideaisthatalistofnumbersisstoredunderagivenname,andthenameisusedtorefertothedata.Alistisspecifiedwiththeccommand,andassignmentisspecifiedwiththebubba

    Whenyouenterthiscommandyoushouldnotseeanyoutputexceptanewcommandline.Thecommandcreatesalistofnumberscalledbubba.Toseewhatnumbersisincludedinbubbatypebubbaandpresstheenterkey:

    >bubba[1]3579

    Ifyouwishtoworkwithoneofthenumbersyoucangetaccesstoitusingthevariableandthensquarebracketsindicatingwhichnumber:

    >bubba[2][1]5>bubba[1][1]3>bubba[0]numeric(0)

  • >bubba[3][1]7>bubba[4][1]9

    Noticethatthefirstentryisreferredtoasthenumber1entry,andthezeroentrycanbeusedtoindicatehowthecomputerwilltreatthedata.Youcanstorestringsusingbothsingleanddoublequotes,andyoucanstorerealnumbers.

    Younowhavealistofnumbersandarereadytoexplore.InthechaptersthatfollowwewillexaminethebasicoperationsinRthatwillallowyoutodosomeoftheanalysesrequiredinclass.

    1.2.ReadingaCSVfile

    Unfortunately,itisraretohavejustafewdatapointsthatyoudonotmindtypinginattheprompt.Itismuchmorecommontohavealotofdatapointswithcomplicatedrelationships.Herewewillexaminehowtoreadadatasetfromafileusingtheread.csvfunctionbutfirstdiscusstheformatofadatafile.

    Weassumethatthedatafileisintheformatcalledcommaseparatedvalues(csv).Thatis,eachlinecontainsarowofvalueswhichcanbenumbersorletters,andeachvalueisseparatedbyacomma.Wealsoassumethattheveryfirstrowcontainsalistoflabels.Theideaisthatthelabelsinthetoprowareusedtorefertothedifferentcolumnsofvalues.

    Firstwereadaveryshort,somewhatsilly,datafile.Thedatafileiscalledsimple.csvandhasthreecolumnsofdataandsixrows.Thethreecolumnsarelabeledtrial,mass,andvelocity.WecanpretendthateachrowcomesfromanobservationduringoneoftwotrialslabeledAandB.AcopyofthedatafileisshownbelowandiscreatedindefianceofWernerHeisenberg:

    silly.csvtrial mass velocityA 10 12A 11 14B 5 8B 6 10A 10.5 13

    B 7 11

    Thecommandtoreadthedatafileisread.csv.Wehavetogivethecommandatleastonearguments,butwewillgivethreedifferentargumentstoindicatehowthecommandcanbeusedindifferentsituations.Thefirstargumentisthenameoffile.Thesecondargumentindicateswhetherornotthefirstrowisasetoflabels.Thethirdargumentindicatesthatthereisacommabetweeneachnumberofeachline.Thefollowingcommandwillreadinthedataandassignittoavariablecalledheisenberg:

  • >heisenbergheisenbergtrialmassvelocity1A10.0122A11.0143B5.084B6.0105A10.5136B7.011>summary(heisenberg)trialmassvelocityA:3Min.:5.00Min.:8.00B:31stBu.:6.251stQu.:10.25Median:8.50Median:11.50Mean:8.25Mean:11.333rdQu.:10.383rdQu.:12.75Max.:11.00Max.:14.00

    (NotethatifyouareusingaMicrosoftsystemthefilenamingconventionisdifferentfromwhatweusehere.Ifyouwanttouseabackslashitneedstobeescaped,i.e.usetwobackslashestogether\.AlsoyoucanspecifywhatfoldertousebyclickingontheFileoptioninthemainmenuandchoosetheoptiontospecifyyourworkingdirectory.)

    Togetmoreinformationonthedifferentoptionsavailableyoucanusethehelpcommand:

    >help(read.csv)

    IfRisnotfindingthefileyouaretryingtoreadthenitmaybelookinginthewrongfolder/directory.Ifyouareusingthegraphicalinterfaceyoucanchangetheworkingdirectoryfromthefilemenu.Ifyouarenotsurewhatfilesareinthecurrentworkingdirectoryyoucanusethedir()commandtolistthefilesandthegetwd()commandtodeterminethecurrentworkingdirectory:

    >dir()[1]"fixedWidth.dat""simple.csv""trees91.csv""trees91.wk1"[5]"w1.dat">getwd()[1]"/home/black/write/class/stat/stat38313F/dat"

    Thevariableheisenbergcontainsthethreecolumnsofdata.Eachcolumnisassignedanamebasedontheheader(thefirstlineinthefile).Youcannowaccesseachindividualcolumnusinga$toseparatethetwonames:

    >heisenberg$trial[1]AABBABLevels:AB>heisenberg$mass[1]10.011.05.06.010.57.0>heisenberg$velocity[1]12148101311

  • Ifyouarenotsurewhatcolumnsarecontainedinthevariableyoucanusethenamescommand:

    >names(heisenberg)[1]"trial""mass""velocity"

    Wewilllookatanotherexamplewhichisusedthroughoutthistutorial.wewilllookatthedatafoundinaspreadsheetlocatedathttp://cdiac.ornl.gov/ftp/ndp061a/trees91.wk1.Adescriptionofthedatafileislocatedathttp://cdiac.ornl.gov/ftp/ndp061a/ndp061a.txt.Theoriginaldataisgiveninanexcelspreadsheet.Ithasbeenconvertedintoacsvfile,trees91.csv,bydeletingthetopsetofrowsandsavingitasacsvfile.Thisisanoptiontosavewithinexcel.(Youshouldsavethefileonyourcomputer.)Itisagoodideatoopenthisfileinaspreadsheetandlookatit.ThiswillhelpyoumakesenseofhowRstoresthedata.

    ThedataisusedtoindicateanestimateofbiomassofponderosapineinastudyperformedbyDaleW.Johnson,J.TimothyBall,andRogerF.WalkerwhoareassociatedwiththeBiologicalSciencesCenter,DesertResearchInstitute,P.O.Box60220,Reno,NV89506andtheEnvironmentalandResourceSciencesCollegeofAgriculture,UniversityofNevada,Reno,NV89512.Thedataisconsistsof54lines,andeachlinerepresentsanobservation.Eachobservationincludesmeasurementsandmarkersfor28differentmeasurementsofagiventree.Forexample,thefirstnumberineachrowisanumber,either1,2,3,or4,whichsignifiesadifferentlevelofexposuretocarbondioxide.Thesixthnumberineveryrowisanestimateofthebiomassofthestemsofatree.Notethattheveryfirstlineinthefileisalistoflabelsusedforthedifferentcolumnsofdata.

    Thedatacanbereadintoavariablecalledtreeinusingtheread.csvcommand:

    >treeattributes(tree)$names[1]"C""N""CHBR""REP""LFBM""STBM""RTBM""LFNCC"[9]"STNCC""RTNCC""LFBCC""STBCC""RTBCC""LFCACC""STCACC""RTCACC"[17]"LFKCC""STKCC""RTKCC""LFMGCC""STMGCC""RTMGCC""LFPCC""STPCC"

  • [25]"RTPCC""LFSCC""STSCC""RTSCC"

    $class[1]"data.frame"

    $row.names[1]"1""2""3""4""5""6""7""8""9""10""11""12""13""14""15"[16]"16""17""18""19""20""21""22""23""24""25""26""27""28""29""30"[31]"31""32""33""34""35""36""37""38""39""40""41""42""43""44""45"[46]"46""47""48""49""50""51""52""53""54"

    ThefirstthingthatRstoresisalistofnameswhichrefertoeachcolumnofthedata.Forexample,thefirstcolumniscalledC,thesecondcolumniscalledN.Treeisoftypedata.frame.Finally,therowsarenumberedconsecutivelyfrom1to54.Eachcolumnhas54numbersinit.

    Ifyouknowthatavariableisadataframebutarenotsurewhatlabelsareusedtorefertothedifferentcolumnsyoucanusethenamescommand:

    >names(tree)[1]"C""N""CHBR""REP""LFBM""STBM""RTBM""LFNCC"[9]"STNCC""RTNCC""LFBCC""STBCC""RTBCC""LFCACC""STCACC""RTCACC"[17]"LFKCC""STKCC""RTKCC""LFMGCC""STMGCC""RTMGCC""LFPCC""STPCC"[25]"RTPCC""LFSCC""STSCC""RTSCC"

    Ifyouwanttoworkwiththedatainoneofthecolumnsyougivethenameofthedataframe,a$sign,andthelabelassignedtothecolumn.Forexample,thefirstcolumnintreecanbecalledusingtree$C:

    >tree$C[1]11111111222222222222222222222223333333[39]3334444444444444

    1.3.BriefNoteonFixedWidthFiles

    TherearemanywaystoreaddatausingR.Weonlygivetwoexamples,directassignmentandreadingcsvfiles.However,anotherwaydeservesabriefmention.Itiscommontocomeacrossdatathatisorganizedinflatfilesanddelimitedatpresetlocationsoneachline.Thisisoftencalledafixedwidthfile.

    Thecommandtodealwiththesekindoffilesisread.fwf.Examplesofhowtousethiscommandarenotexploredhere,butabriefexampleisgiven.Ifyouwouldlikemoreinformationonhowtousethiscommandenterthefollowingcommand:

    >help(read.fwf)

    Theread.fwfcommandrequiresatleasttwooptions.Thefirstisthenameofthefileandthe

  • secondisalistofnumbersthatgivesthelengthofeachcolumninthedatafile.Anegativenumberinthelistindicatesthatthecolumnshouldbeskipped.HerewegivethecommandtoreadthedatafilefixedWidth.dat.Inthisdatafiletherearethreecolumns.Thefirstcolumis17characterswide,thesecondcolumnis15characterswide,andthelastcolumnis7characterswide.Intheexamplebelowweusetheoptionalcol.namesoptiontospecifythenamesofthecolumns:

    >a=read.fwf('fixedWidth.dat',widths=c(17,15,7),col.names=c('temp','offices'))>atempoffices117.035218.0117317.519417.528

    RTutorialbyKellyBlackislicensedunderaCreativeCommonsAttributionNonCommercial4.0InternationalLicense.Basedonaworkathttp://www.cyclismo.org/tutorial/R/.

    ThispagegeneratedusingSphinx.


Top Related