learning awk

56
UNIX IBM India Private Limited © 2012 IBM Corporation Learning AWK

Upload: guy-pruitt

Post on 31-Dec-2015

68 views

Category:

Documents


7 download

DESCRIPTION

Learning AWK. What is awk ?. scripting language used for manipulating data and generating reports created by: Aho, Weinberger, and Kernighan unlike other filters, it operates at the field level and can easily access, transform and format individual fields in a line. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Learning AWK

UNIX

IBM India Private Limited

© 2012 IBM Corporation

Learning AWK

Page 2: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

WHAT IS AWK? scripting language used for manipulating data and

generating reports created by: Aho, Weinberger, and Kernighan unlike other filters, it operates at the field level and

can easily access, transform and format individual fields in a line.

awk programs are based on the idea of pattern and action; the program scans a document looking for a pattern and when found it performs the action.

awk never modifies the input file.

2

Page 3: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

WHAT CAN YOU DO WITH AWK? awk operation:

scans a file line by line splits each input line into fields compares input line/fields to pattern (field

matching is implemented in only awk and perl) performs action(s) on matched lines

Useful for: transform data files produce formatted reports

Programming constructs: format output lines arithmetic and string operations conditionals and loops 3

Page 4: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

THE COMMAND: AWK

4

Page 5: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

BASIC AWK SYNTAX awk [options] ‘script’ file(s)

E.g. awk –F: ‘/search/ {print $0}’ file1

awk [options] –f scriptfile file(s) E.g. awk –F: -f ip.awk file1

Options:

-F to change input field separator

-f to name script file

5

Page 6: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

BASIC AWK PROGRAM

consists of patterns & actions:

awk ‘pattern {action}’ file(s)

if pattern is missing, action is applied to all lines awk '{print}' datafile prints all lines in datafile

if action is missing, the matched line is printed awk '/for/' testfile prints all lines containing

string “for” in testfile

must have either pattern or action

6

Page 7: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

BASIC TERMINOLOGY: INPUT FILE A field is a unit of data in a line Each field is separated from the other fields by the field separator

default field separator is whitespace A record is the collection of fields in a line A data file is made up of records

7

Page 8: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

BUFFERS

awk supports two types of buffers:

record and field

field buffer: one for each fields in the current record. names: $1, $2, …

record buffer : $0 holds the entire record (print and print $0 are

same)

8

Page 9: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

SOME SYSTEM VARIABLESFS Field separator (default=whitespace)RS Record separator (default=\n)NF Number of fields in current recordNR Number of the current recordOFS Output field separator (default=space)ORS Output record separator (default=\n)FILENAME Current filename$0 Entire input record$n nth record field.ARGC Number of arguments on command line. ARGV An array containing the command-line arguments.

9

Page 10: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

EXAMPLE: RECORD NUMBER - NR

% cat empsTom Jones 4424 5/12/66 543354Mary Adams 5346 11/4/63 28765Sally Chang 1654 7/22/54 650000Billy Black 1683 9/23/44 336500

% awk '{print NR, $0}' emps1 Tom Jones 4424 5/12/66 5433542 Mary Adams 5346 11/4/63 287653 Sally Chang 1654 7/22/54 6500004 Billy Black 1683 9/23/44 336500

10

Page 11: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

EXAMPLE: SPACE AS DEFAULT FIELD SEPARATOR - FS

% cat empsTom Jones 4424 5/12/66 543354Mary Adams 5346 11/4/63 28765Sally Chang 1654 7/22/54 650000Billy Black 1683 9/23/44 336500

% awk '{print NR, $1, $2, $5}' emps1 Tom Jones 5433542 Mary Adams 287653 Sally Chang 6500004 Billy Black 336500

11

Page 12: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

EXAMPLE: COLON AS FIELD SEPARATOR - FS

% cat em2

Tom Jones:4424:5/12/66:543354

Mary Adams:5346:11/4/63:28765

Sally Chang:1654:7/22/54:650000

Billy Black:1683:9/23/44:336500

% awk -F: '/Jones/{print $1, $2}' em2

Tom Jones 4424

12

Page 13: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

EXAMPLE: MULTIPLE FIELD SEPARATORS% cat d1.txt

1|NE|20-JAN-2012

2|DE|02-FEB-2012

3|PE|12-MAR-2012

% awk -F"[|-]" '{print $1,$2,$4}' d1.txt1 NE JAN

2 DE FEB

3 PE MAR

awk '{FS="[|-]" ;print $1,$2,$4}' d1.txt

13

Page 14: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

EXAMPLE: OFS File: gradesjohn 85 92 78 94 88 andrea 89 90 75 90 86 jasper 84 88 80 92 84

% awk '{OFS="-";print $1 , $2}' grades

john-85

andrea-89

Jasper-84

14

Page 15: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

AWK SCRIPTS awk scripts are divided into three major

parts:

comment lines start with #15

Page 16: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

AWK SCRIPTS BEGIN: pre-processing (optional)

performs processing that must be completed before the file processing starts (i.e., before awk starts reading records from the input file)

useful for initialization tasks such as to initialize variables and to create report headings

BODY: Processing contains main processing logic to be applied to input records like a loop that processes input data one record at a time:

if a file contains 100 records, the body will be executed 100 times, one for each record

END: post-processing (optional) contains logic to be executed after all input data have been

processed logic such as printing report grand total should be

performed in this part of the script 16

Page 17: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

PATTERN / ACTION SYNTAX

17

Page 18: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

CATEGORIES OF PATTERNS

18

Page 19: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

EXAMPLE: SIMPLE PATTERN

% cat employees2

Tom Jones:4424:5/12/66:543354

Mary Adams:5346:11/4/63:28765

Sally Chang:1654:7/22/54:650000

Billy Black:1683:9/23/44:336500

(find those recs which contain 00 at the end)

% awk –F: '/00$/' employees2

Sally Chang:1654:7/22/54:650000

Billy Black:1683:9/23/44:336500

19

Page 20: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

EXAMPLE: SIMPLE PATTERN% cat datafile

northwest NW Charles Main 3.0 .98 3 34

western WE Sharon Gray 5.3 .97 5 23

southwest SW Lewis Dalsass 2.7 .8 2 18

southern SO Suan Chin 5.1 .95 4 15

southeast SE Patricia Hemenway 4.0 .7 4 17

eastern EA TB Savage 4.4 .84 5 20

northeast NE AM Main 5.1 .94 3 13

north NO Margot Weber 4.5 .89 5 9

central CT Ann Stephens 5.7 .94 5 13

(find those records which have .7 at the end in 5th field)

% awk '$5 ~ /\.[7-9]+/' datafile

southwest SW Lewis Dalsass 2.7 .8 2 18

central CT Ann Stephens 5.7 .94 5 13 20

Page 21: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

EXAMPLES: SIMPLE PATTERN% awk '$2 !~ /E/{print $1, $2}' datafilenorthwest NWsouthwest SWsouthern SOnorth NOcentral CT

(those records which start with n or s)% awk '/^[ns]/{print $1}' datafilenorthwestsouthwestsouthernsoutheastnortheastnorth

21

Page 22: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

RANGE PATTERNS Matches ranges of consecutive input lines

Syntax:pattern1 , pattern2 {action}

pattern can be any simple pattern pattern1 turns action on pattern2 turns action off

22

Page 23: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

RANGE PATTERN EXAMPLE

23

Page 24: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

POSITIONAL PARAMETERS

24

Positional parameters in awk are represented as $1, $2, $3 and so forth.

Shell also uses identical parameters to represent the command line arguments, so these in awk have to be placed in ‘single quotes’.

Example:

% script1.sh 400

inside the shell script, awk can access parameter like:

$3 > ‘$1’ (instead of $3 > 400)

Page 25: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

getline : MAKING awk INTERACTIVE

25

Usage : getline var1 < "/dev/tty" Example: cat test1.awk BEGIN { printf "Enter the salary :"

getline sal < "/dev/tty" }

$8 > sal {printf "Employee %s has salary above %d\n", $3, sal }

=> cat datafilenorthwest NW Charles Main 3.0 .98 3 34

western WE Sharon Gray 5.3 .97 5 23

southwest SW Lewis Dalsass 2.7 .8 2 18

southern SO Suan Chin 5.1 .95 4 15

=> awk -f test1.awk datafileEnter the salary :20 Interactive behavior

Employee Charles has salary above 20

Employee Sharon has salary above 20

Page 26: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

ARITHMETIC OPERATORS

Operator Meaning Example

+ Add x + y

- Subtract x – y

* Multiply x * y

/ Divide x / y

% Modulus x % y

^ Exponential x ^ y

Examples:% awk '$3 * $4 > 500 {print $0}' file

Calculate total file size in a directory:% ls -ltr | awk ‘BEGIN {print "Calculating total file size"} {x=x+$5} END { print "total bytes: " x }’

Calculating total file size

total bytes: 11915

26

Page 27: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

RELATIONAL OPERATORS

Operator Meaning Example

< Less than x < y

< = Less than or equal x < = y

== Equal to x == y

!= Not equal to x != y

> Greater than x > y

> = Greater than or equal to x > = y

~ Matched by reg exp x ~ /y/

!~ Not matched by req exp x !~ /y/

27

Page 28: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

LOGICAL OPERATORS

Operator Meaning Example

&& Logical AND a && b

|| Logical OR a || b

! NOT ! a

Examples:% awk '($2 > 5) && ($2 <= 15)

{print $0}' file

% awk '$3 == 100 || $4 > 50' file

28

Page 29: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

AWK ACTIONS

29

Page 30: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

AWK VARIABLES

Format: variable = expression

Examples:% awk '$1 ~ /Tom/ {wage = $3 * $4; print wage}' filename

% awk '$4 == "CA" {$4 = "California"; print $0}‘ filename

30

Page 31: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

AWK ASSIGNMENT OPERATORS

= assign result of right-hand-side expression to left-hand-side variable

++ Add 1 to variable-- Subtract 1 from variable+= Assign result of addition-= Assign result of subtraction*= Assign result of multiplication/= Assign result of division%= Assign result of modulo^= Assign result of exponentiation

31

Page 32: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

AWK EXAMPLE File: gradesjohn 85 92 78 94 88 andrea 89 90 75 90 86 jasper 84 88 80 92 84

awk script: average.awk# average five grades { total = $2 + $3 + $4 + $5 + $6 avg = total / 5 print $1, avg }

Run as: awk –f average grades 32

Page 33: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

OUTPUT STATEMENTS

print

print easy and simple output

printf

print formatted (similar to C printf)

sprintf

format string (similar to C sprintf)

33

Page 34: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

FUNCTION: PRINT Writes to standard output Output is terminated by ORS

default ORS is newline If called with no parameter, it will print $0 Printed parameters are separated by OFS,

default OFS is blank Print control characters are allowed:

\n \f \a \t \\ …

34

Page 35: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

PRINT EXAMPLE

% awk '{print}' gradesjohn 85 92 78 94 88andrea 89 90 75 90 86

% awk '{print $0}' gradesjohn 85 92 78 94 88andrea 89 90 75 90 86

% awk '{print($0)}' gradesjohn 85 92 78 94 88andrea 89 90 75 90 86

35

Page 36: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

PRINT EXAMPLE

% awk '{print $1, $2}' grades

john 85

andrea 89

% awk '{print $1 "," $2}' grades

john,85

andrea,89

% awk '{OFS="-";print $1 , $2}' grades

john-85

andrea-8936

Page 37: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

REDIRECTING PRINT OUTPUT Print output goes to standard output unless

redirected via:> “file”>> “file”| “command”

Example:

% awk '{print $1 , $2 > "file"}' grades

% cat filejohn 85

andrea 89

jasper 8437

Page 38: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

REDIRECTING OUTPUT EXAMPLEo Remove only files

=> ls –l| awk '$1!~/^drwx/{print $9}'|xargs rm

o Kill a process

=> find / -name abc.txt -print 2>/dev/nullu807735 21626990 20119632 7 00:41:04 pts/3 0:07 find / -name abc.txt -print

=> kill -9 `ps -ef | awk '$0 ~ /-name abc.txt/ {print $2}'`

38

Page 39: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

PRINT EXAMPLE

% awk '{print $1,$2 | "sort"}' grades

andrea 89

jasper 84

john 85

% awk '{print $1,$2 | "sort –k 2"}' grades

jasper 84

john 85

andrea 89

39

Page 40: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

PRINT EXAMPLE

% date

Wed Nov 19 14:40:07 CST 2008

% date |

awk '{print "Month: " $2 "\nYear: ", $6}'

Month: Nov

Year: 2008

40

Page 41: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

PRINTF: FORMATTING OUTPUT

Syntax:

printf(format-string, var1, var2, …)

works like C printf each format specifier in “format-string” requires

argument of matching type

41

Page 42: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

FORMAT SPECIFIERS

%d decimal integer

%c single character

%s string of characters

%f floating point number

%o octal number

%x hexadecimal number

%e scientific floating point notation

%% the letter “%”

42

Page 43: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

SPRINTF: FORMATTING TEXT

Syntax:sprintf(format-string, var1, var2, …)

Works like printf, but does not produce output Instead it returns formatted string

Example:{

text = sprintf("1: %d – 2: %d", $1, $2)

print text

}

43

Page 44: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

AWK built-in functions for STRING manipulation

tolower(string) Example: tolower("MiXeD cAsE 123")

returns "mixed case 123"

toupper(string) returns a copy of string, with each lower-case character converted to

upper-case.

index(input_string,find_string) This searches the string input_string for the first occurrence of the

string find_string, and returns the position in characters where that occurrence begins in the string input_string. For example:

awk 'BEGIN { print index("peanut", "an") }'

prints `3'. If find_string is not found, index returns 0.

44

Page 45: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

AWK built-in functions for STRING manipulationlength(string) This returns the number of characters in string.

substr(string, start, length) Returns the substring in “string” startng from position “start” upto

position “length”.

split(string, array, fieldsep) divides string into pieces separated by fieldsep, and stores the

pieces in array if the fieldsep is omitted, the value of FS is used.

Example: split("auto-da-fe", a, "-")

sets the contents of the array a as follows:

a[1] = "auto"

a[2] = "da"

a[3] = "fe" 45

Page 46: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

AWK built-in functions for STRING manipulationsub(search_string, replacement_string, source) This function searches the ‘source’ for first occurrence of

‘search_string’ and replaces it with ‘replacement_string’.

Example:

str = "water, water, everywhere"

sub(/at/, "ith", str)

sets str to "wither, water, everywhere"

Example: awk '{ sub(/candidate/, "& and his wife"); print }' filename

changes the first occurrence of ‘candidate' to ‘candidate and his wife‘

on each input line in the file.

gsub is similar to sub but has ‘global’ effect, i.e. it makes

replacement in whole file.46

Page 47: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

AWK EXAMPLE: LIST OF PRODUCTS103:sway bar:49.99 101:propeller:104.99 104:fishing line:0.99 113:premium fish bait:1.00 106:cup holder:2.49 107:cooler:14.89 112:boat cover:120.00 109:transom:199.00 110:pulley:9.88 105:mirror:4.99 108:wheel:49.99 111:lock:31.00 102:trailer hitch:97.95

47

Page 48: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

AWK EXAMPLE: OUTPUTMarine Parts R UsMain catalogPart-id name price======================================101 propeller 104.99102 trailer hitch 97.95103 sway bar 49.99104 fishing line 0.99105 mirror 4.99106 cup holder 2.49107 cooler 14.89108 wheel 49.99109 transom 199.00110 pulley 9.88111 lock 31.00112 boat cover 120.00113 premium fish bait 1.00======================================Catalog has 13 parts

48

Page 49: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

AWK EXAMPLE: COMPLETEBEGIN { FS= ":" print "Marine Parts R Us" print "Main catalog" print "Part-id\tname\t\t\t price" print "======================================"}{ printf("%3d\t%-20s\t%6.2f\n",$1,$2,$3) | "sort" count++}END { print "======================================" print "Catalog has " count " parts"}

49

Page 50: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

AWK ARRAY awk allows one-dimensional arrays to store strings or

numbers index can be number or string array need not be declared, its declared when used array elements are created when first used

initialized to 0 or “”

50

Page 51: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

ARRAYS IN AWK

Syntax:

arrayName[index] = value

Examples:

list[1] = "one"

list[2] = "three"

list["other"] = "oh my !"

51

Page 52: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

ILLUSTRATION: ASSOCIATIVE ARRAYS awk arrays can use string as index

52

Page 53: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

DELETE ARRAY ENTRY The delete function can be used to delete an element

from an array.

Format:delete array_name [index]

Example:delete deptSales["supplies"]

53

Page 54: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

AWK CONTROL STRUCTURES Conditional

if-else

Repetition for while do-while also: break, continue, next

54

Page 55: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

IF STATEMENT

Syntax:if (conditional expression)

statement-1

else

statement-2

Example: for $6 > 1200 {

This can also be written as below in term of if:awk ‘{if ( $6 > 1200)

print $2;

else

print $3}’ filename 55

Page 56: Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

56

Awk limitations

http://docstore.mik.ua/orelly/unix/sedawk/ch10_08.htm

http://balazsdeak.blogspot.in/2010/02/solaris-awk-max-record-size-is-2559-2-8.html