unix ii:grep, awk, sed - grapenthin · •awk–original •nawk–“new awk”, version used on...

26
UNIX II:grep, awk, sed October 30, 2017

Upload: phungdat

Post on 17-Feb-2019

271 views

Category:

Documents


1 download

TRANSCRIPT

UNIXII:grep,awk,sedOctober30,2017

Filesearchingandmanipulation

• Inmanycases,youmighthaveafileinwhichyouneedtofindspecificentries(wanttofindeachcaseofNaN inyourdatafile forexample)

• Oryouwanttoreformatalongdatafile (changeorderofcolumns,orjustusecertaincolumns)

• Canbedonewithwritingpythonorotherscripts,todaywilluseotherUNIXtools

grep:globalregularexpressionprint• Usetosearchforapatternandprintthem• Amazinglyuseful!(alotlikeGoogle)

grep

>>>grepOklahomaone_week_eq.txt2017-10-28T09:32:45.970Z,35.3476,-98.0622,5,2.7,mb_lg,,133,0.329,0.3,us,us1000ay0b,2017-10-28T09:47:05.040Z,"11kmWNWofMinco,Oklahoma",earthquake,1.7,1.9,0.056,83,reviewed,us,us

2017-10-28T04:08:45.890Z,36.2119,-97.2878,5,2.5,mb_lg,,41,0.064,0.32,us,us1000axz3,2017-10-28T04:22:21.040Z,"8kmSofPerry,Oklahoma",earthquake,1.4,2,0.104,24,reviewed,us,us

2017-10-27T18:39:28.100Z,36.4921,-98.7233,6.404,2.7,ml,,50,,0.41,us,us1000axpz,2017-10-28T02:02:23.625Z,"33kmNWofFairview,Oklahoma",earthquake,1.3,2.6,,,reviewed,tul,tul

2017-10-27T10:00:07.430Z,36.2851,-97.506,5,2.8,mb_lg,,25,0.216,0.19,us,us1000axgi,2017-10-27T19:39:37.296Z,"19kmWofPerry,Oklahoma",earthquake,0.7,1.8,0.071,52,reviewed,us,us

2017-10-25T15:17:48.200Z,36.2824,-97.504,7.408,3.1,ml,,25,,0.23,us,us1000awq6,2017-10-25T21:38:59.678Z,"19kmWofPerry,Oklahoma",earthquake,1.1,5,,,reviewed,tul,tul

2017-10-25T11:05:21.940Z,35.4134,-97.0133,5,2.5,mb_lg,,157,0.152,0.31,us,us1000awms,2017-10-27T21:37:47.660Z,"7kmESEofMcLoud,Oklahoma",earthquake,1.7,2,0.117,19,reviewed,us,us

2017-10-25T01:50:53.100Z,36.9748,-99.4244,8.115,2.9,ml,,197,,0.64,us,us1000awir,2017-10-26T00:52:01.343Z,"23kmNEofBuffalo,Oklahoma",earthquake,2,7.6,,,reviewed,tul,tul

2017-10-24T23:18:09.000Z,35.3787,-98.0931,7.72,2.7,ml,,91,,0.49,us,us1000awhe,2017-10-26T00:47:37.010Z,"13kmWofUnionCity,Oklahoma",earthquake,2.4,5.7,,,reviewed,tul,tul

2017-10-23T15:57:10.890Z,36.6565,-97.8019,5,2.6,mb_lg,,39,0.2,0.15,us,us1000avxp,2017-10-23T18:30:47.642Z,"17kmSSWofMedford,Oklahoma",earthquake,1.2,1.8,0.132,15,reviewed,us,us

Basicsyntax:>>>grep<pattern><inputfile>

grep

• Lotsofusefuloptionsavailable(readthemanpage!)• -w:lookforawholeword• -i :ignorecase• -v:omitmatchinglines• -c:provideacountofmatchinglines

grep

Whatisaregularexpression?

RegularExpression

• Setofcharactersthatspecifyapattern

• Makeschangingandsearchingfortexteasyjustfromthecommandline.

• Regularexpressionsareacceptedinputforgrep,sed,awk,perl,vimandotherunix commands.

• It’sallaboutsyntax….(andbecauseit’sUNIX,it’salittlecryptic)

• http://www.regular-expressions.info/quickstart.html

SimpleRegularExpressionSymbols

• .(period)--- matchesanysinglecharacter• B--- matchesuppercaseB• b--- matcheslowercaseb• *--- matcheszeroormoreoccurrencesofprecedingcharacter• ^--- goestobeginningofaline• Example– searchafilewhere#isusedtocommentlines>>>grep^#filenameWillpulloutallthelineswhere#isthefirstcharacterinline

• $--- endoftheline

Generallyagoodideatosurroundregularexpressionwithsinglequotesoncommandlinetoprotectitfrombeinginterpretedbytheshell.

SimpleRegularExpressionSymbols

• \ --- lookingforasymbol

• []--- matchesmemberoftherangewithinthebrackets

• [^]--- matchesanythingexceptwhat’sinthebracket

• Non-printablecharacters:• \t:foratabcharacter• \r:forcarriagereturn• \n:fornewline• \s:forawhitespace

Sed – streameditor• Commandlinetoolforeditingfileslinebyline,largelyusedforsubstitution• Likegrepforsearching,butcanreplacefoundpatternwithsomethingelse• Wanttochangeeveryinstanceofmb tomlinmyfile?

>>>sed s/mb/mlfilename

Sed

• Basicstructureforsubstitution:• s--- isthecommandthatindicatessubstitution• delimiter

• Canbeanythingyouwant,slash(/)iscommon,sois_or:• Butifyouneedtosearchsomethingthathasa/willneedtoquotetheslash

usingbackslash\>>>sed 's/\/usr\/local\/bin/\/usr\/bin'fileWillchange/usr/local/binto/usr/binforlinesinfilethatcontain/usr/local/bin

• regularexpressionorpatterntosearchfor• replacement

• Ifwanttodoasearchandreplaceglobally(inentirefile),put“/g”atend.Otherwiseitwillreplaceonlythefirstinstancefoundoneachline

>>>sed 's/\/usr\/local\/bin/\/usr\/bin/g'file

• Sed usesregularexpressions,sameasgrep

>>>sed s/mb/mlfilename

awk

• ProgramminglanguageavailableonmostUnix-likeOS• Developedin1970s(namecomesfromfirstlettersoflastnamesofdevelopers)• Usefulformanipulatingtextfiles• Oneofthemostusefulunix toolsyoucandevelop

• Alsoabletodofloatingpointmath• Structuredasasequenceofpatternsandthenactionsdoperformwhenpatternsarefound• Usedontextfiles:columns=fields;lines=records

awk vsnawk vsgawk

• Differentversionsexist• awk – original• nawk – “newawk”,versionusedonMacsas“awk”• gawk– GNUawk,standardonlinux,compatiblewithawk andnawk.CanaccessthisonMacsaswell– use“gawk”orsetanaliasforit

• Afewminordifferencesinsyntaxbetweenversions

Usingawk

• Cancallitfromthecommandline:>>>awk [options]‘{commands}’variablesinfile>>>awk –fscriptfile variablesinfile

• Orcreateanexecutableawk script• Filecontains:

#!/usr/bin/awksomesetofcommands

>>>chmod +xtest.awk>>>./test.awk

awk andtext

• awk commandsareappliedtoeveryrecord(=line)ofafile

• itisdesignedtoseparatethedataineachlineintoafield(=column)

• essentially,eachfieldbecomesamemberofanarraysothatthefirstfieldis$1,secondfield$2,thirdfield$3…

• $0 referstotheentirerecord

awk:Fieldseparators

• thedefaultfieldseparator isoneormorewhitespaces$1 $2$3$4$5$6$7$8$9$10$111191892295449.29-1.69898.29815.0ehb

• thefieldseparatormaybemodifiedbyresettingtheFSbuiltinvariable• Example:

Separator is “:”, so reset it.

awk - print

• Oneofthemostcommoncommandsusedinawkscriptsisprint

• awk isnotsensitivetowhitespaceinthecommands>>>awk –F”:”‘{print$1$3}’/etc/passwdnobody-2

• twosolutionstothis>>>awk –F”:”‘{print$1“ “$3}’/etc/passwd>>>awk –F”:”‘{print$1,$3}’/etc/passwdnobody-2

• anystringornumerictextcanbeexplicitlyoutputusing“”

Assumeastartingfilelikeso:

11191892295449.29-1.69898.29815.00.00.0ehb FEQx

>>>awk '{print"latitude:",$9,"longitude:",$10,"depth:",$11}’earthquake.txt

latitude:-1.698longitude:98.298depth:15.0latitude:9.599longitude:92.802depth:30.0latitude:4.003longitude:94.545depth:20.0

11191892295449.29-1.69898.29815.00.00.0ehb FEQx

• youcanspecifyanewlineintwoways>>>awk '{print"latitude:",$9; print"longitude:",$10}’earthquake.txt>>>awk '{print"latitude:",$9”\n”,”longitude:",$10}’earthquake.txt

latitude:-1.698longitude:98.298

awk andif

• Ifstatementsareveryusefulinawk:

awk andmath• Bigadvantage– itdoesfloatingpointmath(rememberbashdoesnot)• itstoresallvariablesasstrings,butwhenmathoperatorsareapplied,itconvertsthestringstofloatingpointnumbersifthestringconsistsofnumericcharacters

• Allbasicarithmeticislefttorightassociative

• +:addition• - :subtraction• *:multiplication• /:division• %:remainderormodulus• ^:exponent• otherstandardCprogrammingoperators

• Assignmentoperators• =:setvariableequaltovalueonright• +=:setvariableequaltoitselfplusthevalueonright• -=:setvariableequaltoitselfminusthevalueonright• *=:setvariableequaltoitselftimesthevalueonright• /=:setvariableequaltoitselfdividedbyvalueonright• %=:setvariableequaltotheremainderofitselfdividedbythevalueontheright• ^=:setvariableequaltotheitselftotheexponentfollowingtheequalsign

awk relationaloperators

• Returns1iftrueand0iffalse• Allrelationaloperatorsarelefttorightassociative

• <:testforlessthan• <=:testforlessthanorequalto• >:testforgreaterthan• >=:testforgreaterthanorequalto• ==:testforequalto• !=:testfornotequal

awk logicaloperators

• Booleanoperatorsreturn1fortrueand0forfalse

• &&:logicalAND;teststhatbothexpressionsaretrue• lefttorightassociative

• ||:logicalOR;teststhatoneorbothoftheexpressionsaretrue

• lefttorightassociative

• !:logicalnegation;teststhatexpressionistrue

Usefulawk built-invariables

• FS:FieldSeparator(separatescolumns)• NR:recordnumber(linenumber)• OFS:outputfieldseparator• Defaultiswhitespace

• ORS:outputrecordseparator• Defaultis\n(newline)

• OFMT:outputformatfornumbers• NF:numberoffieldsinthecurrentrecord

Usingvariablesinawk

• 1.Assigntheshellvariablestoawk variablesafterthebodyofthescript,butbeforeyouspecifytheinputfileawk '{print v1, v2, NF, NR}' v1=$VAR1 file1 v2=$VAR2 file2

Or

• 2.Usethe-vswitchtoassigntheshellvariablestoawkvariables.awk -v v1=$VAR1 -v v2=$VAR2 '{print v1, v2}' input_file

Moreawk …

• Developingmorecomplexprogramsinawk• Useofforloops,whileloops,if/then/else• Formatoutput• Definefunctions• Matchingregularexpressions

• Worthspendingtimeexploringwebsites/booksonawk functionality– itwilllikelybecomeoneofyourmostusedtools.