13 more advanced awk mauro jaskelioff (originally by gail hopkins)

38
13 More Advanced Awk Mauro Jaskelioff (originally by Gail Hopkins)

Upload: brook-hubbard

Post on 29-Jan-2016

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 13 More Advanced Awk Mauro Jaskelioff (originally by Gail Hopkins)

13 More Advanced Awk

Mauro Jaskelioff(originally by Gail Hopkins)

Page 2: 13 More Advanced Awk Mauro Jaskelioff (originally by Gail Hopkins)

Introduction

• More awk programming– The awk programming model– Input to and output from pipes– System()– Formatted printing (printf, sprintf)– Forcing variable types

• Using sed and awk together

Page 3: 13 More Advanced Awk Mauro Jaskelioff (originally by Gail Hopkins)

Palindrome Example

• Suppose we wanted to write an awk script which takes a number or string and tells the user whether it is a palindrome:

$ palindrome.shEnter a number: 1221successful$ palindrome.shEnter a number:1234failure$

Page 4: 13 More Advanced Awk Mauro Jaskelioff (originally by Gail Hopkins)

#!/bin/shecho -n "Enter a number: "read a junk

echo "$a" | awk '

{pal=$1stat="successful"l=length(pal)loop=int(l/2)for(i=1;i<=loop;i++)

{first=substr(pal,i,1)

last=substr(pal,l-i+1,1)if(first!=last)

stat="failure"}

print stat}'

Page 5: 13 More Advanced Awk Mauro Jaskelioff (originally by Gail Hopkins)

Breakdown of Palindrome Example

#!/bin/shecho -n "Enter a number: "read a junk

echo "$a" | awk '

Print the text “Enter a number: “ to the command line. The -n option tells the shell not to put in a new line

Read the number into the variable a. If user has added anything else on the command line by mistake, read this into the variable junk (which is not used)Echo the value of a and pipe

it onto awk for use in the awk part of the script

Page 6: 13 More Advanced Awk Mauro Jaskelioff (originally by Gail Hopkins)

{pal=$1stat="successful"l=length(pal)loop=int(l/2)for(i=1;i<=loop;i++)

Assign stat to be the string “successful”

Find the length of pal using the length() function and assign to l

Define a variable called “loop” to be the an integer of length (l) divided by 2. (I.e. a whole number, not a decimal.)

pal is set to be the value of the first argument given to awk (which will be the value of a)

Iterate from 1 through to the value of loop, incrementing by 1 each time

Page 7: 13 More Advanced Awk Mauro Jaskelioff (originally by Gail Hopkins)

{first=substr(pal,i,1)last=substr(pal,l-i+1,1)if(first!=last)

stat="failure"}

print stat}'

Print the string in the variable “stat”. Stat will contain “successful” if first and last match with every iteration of the loop. If there is at least one mismatch during a loop, stat will contain “failure”.

In this loop section, we are counting in from the front and back of the string and comparing each character pair in turn

Use the substr() function to get a substring from pal, starting at position i which is 1 character long. Assign this to the variable “first”.

Use the substr() function to get a substring from pal, starting at position which is the length minus i, +1, which is 1 character long. Assign this to the variable “last”.

If the character in first and last are not the same, set the variable “stat” to contain the string “failure”.

Page 8: 13 More Advanced Awk Mauro Jaskelioff (originally by Gail Hopkins)

Awk’s programming model• Awk has a main input loop

– It reads one line of input from a file and makes it available for processing

– It is executed as many times as there are lines of input

– It does not execute until there is a line of input– It terminates when there are no more lines of

input

• c.f. other programming languages which require the programmer to create the main input loop, open the file(s) and read one line at a time…

Page 9: 13 More Advanced Awk Mauro Jaskelioff (originally by Gail Hopkins)

Awk’s programming model - BEGIN and END

• With awk, the whole programming loop is executed for each line of input

• Each statement within the loop is executed on each input line that matches it– (Each statement has a pattern to be matched

and a corresponding action to be taken if a match is found)

• If you want to do some processing before or after the main programming loop, use BEGIN and END respectively

Page 10: 13 More Advanced Awk Mauro Jaskelioff (originally by Gail Hopkins)

Awk’s programming model - next and getline

• Suppose you have the awk statement:– total = total + $newValue– … used to provide a total across a number of

input lines– …and you wanted to read the remaining

lines of input before moving on to the next awk statement you need to use either next or getline:

while ((getline newValue < “myFile”) > 0)

{

total = total + $newValue

}

printtotal = total + $newValue

next

Page 11: 13 More Advanced Awk Mauro Jaskelioff (originally by Gail Hopkins)

next and getline

• The next command is used to read another line of input from a data stream and passes control back to the top of the script

• The getline function is similar but:– Can also be used to read from files and pipes– … does not pass control back to the top of

the script• getline returns one of three possible

values:– 1 if able to read a line– 0 if end-of-file encountered– -1 if an error encountered

Page 12: 13 More Advanced Awk Mauro Jaskelioff (originally by Gail Hopkins)

A note about getline

• getline is a statement (not a function) although it returns a value, if you put brackets after it, e.g.:getline()You will get an error!

Page 13: 13 More Advanced Awk Mauro Jaskelioff (originally by Gail Hopkins)

Reading input from a file and assigning variables

• Use the < redirection operator:– getline < "myFile"

while ((getline newValue < "myFile") > 0)

{

BEGIN {printf "Enter a name: "

getline < "-"

print

}

Here, the input record is assigned to the variable “newValue”

In this example, the user is prompted to enter their name. This is assigned to $0 and the print statement outputs the value of $0 by default

Page 14: 13 More Advanced Awk Mauro Jaskelioff (originally by Gail Hopkins)

Reading input from a pipe

• The UNIX “who am i” command will give the following type of output:

• This output can be piped to getline:– "who am i" | getline

• Here, $0 will be set to the output of the command, the line will be parsed into fields such that “zlizmj” will be put in field $1, “pts/32” will be put into $2, etc. The system variable NF will be set

$ who am i

zlizmjj pts/32 Apr 20 12:25 (10.20.1.40)

Page 15: 13 More Advanced Awk Mauro Jaskelioff (originally by Gail Hopkins)

Reading input from a pipe and assigning variables

awk ′

BEGIN {

"who am i" | getline

name = $1

FS = ":"

}

name ~ $1 {print $5}

′ /etc/passwd

This script pipes the result of the “who am i” command to getline which parses it into fields. The variable “name” is assigned to field number 1 and the File Separator is assigned to “:”

The script then tests to see whether the first field ($1) in /etc/passwd is the same as that stored in name (the fields in /etc/passwd are separated by a “:”) If so, the 5th field of /etc/passwd is printed (which contains the corresponding user’s full name)

Page 16: 13 More Advanced Awk Mauro Jaskelioff (originally by Gail Hopkins)

Reading input from a pipe and assigning variables (2)

• The UNIX command whoami returns only the user’s login name:

$ whoami

zlizmj

"whoami" | getline name

print name

In this example, the output of “whoami” is assigned to the variable “name”

Page 17: 13 More Advanced Awk Mauro Jaskelioff (originally by Gail Hopkins)

Some Important Limitations

• There is a limit to the number of pipes and files that the system can have open at any one time– This limit varies from system to system– Traditionally 20 open files in BSD UNIX

• Use the close() function!• Some other limits are:

– Number of fields per record 100– Characters per input record 2048 (set in

size.h)– See the awk manual page for more

information

Page 18: 13 More Advanced Awk Mauro Jaskelioff (originally by Gail Hopkins)

Using close() with Pipes and Files

• Why use close()?– So your program can open as many pipes and

files as it needs without exceeding the system limit

– It allows your program to run the same command twice

– You may need close() to force an output pipe to finish its work

{ do something | "sort > myFile" }

END {

close("sort > myFile")

while ((getline < "myFile") > 0)

{ do more stuff }

Page 19: 13 More Advanced Awk Mauro Jaskelioff (originally by Gail Hopkins)

Directing Output to a File or Pipe

• Use print

• Use a shell script

print $0 | sort | uniqprint > "myFile"

awk ‘

{ do something

print $0

}’ $* |

sort | uniq

Page 20: 13 More Advanced Awk Mauro Jaskelioff (originally by Gail Hopkins)

Formatted Printing - printf

• One of awk’s most important purposes is to produce formatted reports

• We can use printf for this• Suppose we wanted the following

output from awk:Module Students Convener

G51UST 15 Mauro Jaskelioff

G51CSA 17 Liyang Hu

G51PRG 39 Paul Dempster

Page 21: 13 More Advanced Awk Mauro Jaskelioff (originally by Gail Hopkins)

Formatted Printing - printf (2)• printf uses format specifiers:

• Use format specifiers with a % symbol:printf("%s\t%s\t%s\n", "Module", "Students", "Convener")

BEGIN {

for(i=1;i<=numModules;i++) {

printf("%s\t%d\t%s\n", $module[i], $students[i], $convener[i])

}

}

c ascii character

d decimal integer

e floating point

s string

NOTE: \t inserts a tab character, \n inserts a new line

Page 22: 13 More Advanced Awk Mauro Jaskelioff (originally by Gail Hopkins)

sprintf• Like printf, but sprintf returns a string

that can be assigned to a variable

while ((getline < inputFile) > 0)

{

myString = sprintf("%s:%s:%s", $1, $2, $3)

}

This example repeatedly gets a line from “inputFile” and prints the first, second and third fields as colon separated strings to myString

Page 23: 13 More Advanced Awk Mauro Jaskelioff (originally by Gail Hopkins)

sprintf (2)• Like printf, but sprintf returns a string

that can be assigned to a variable

for(i=$startOfAscii; i<=$endofAscii; i++)

{

letter = sprintf("%c",i)

}

This example converts numbers into ASCII characters

Page 24: 13 More Advanced Awk Mauro Jaskelioff (originally by Gail Hopkins)

Built in Arithmetic Functions

• awk has a number of arithmetic functions that are built in. Some are shown below:

• exp(x) Returns e to the power x• int(x) Returns a truncated value of x• sqrt(x) Returns the square root of x• cos(x) Returns the cosine of x

Page 25: 13 More Advanced Awk Mauro Jaskelioff (originally by Gail Hopkins)

Built in String Functions

• split(str,arr,fs)Splits the string into elements of array arr, using

field separator, fs

• substr(str,pos,len)Returns substring of string str at beginning

position pos up to a maximum length, len. If len is not specified then the string from p to the end is used

• length(str)Returns the length of the string str, or the length

of $0 if no string specified

Page 26: 13 More Advanced Awk Mauro Jaskelioff (originally by Gail Hopkins)

Built in String Functions (2)

• index(str,substr)Returns the position of substring substr in

string str or 0 if it is not present

• gsub(regex,s,str) Globally substitutes s for each match of the

regular expression regex in the string str. Returns the number of substitutions. If a string str is not supplied, it will use $0

Page 27: 13 More Advanced Awk Mauro Jaskelioff (originally by Gail Hopkins)

Built in String Functions - match()

• match() is used to test whether a regular expression matches a specified stringmatch("in UST you learn about shell", /[A-Z]+/)

– match() takes two arguments, the string to be examined, THEN the regular expression (note the change of order)

– match() sets two system variables:• RSTART - the starting position of the substring

– This is the value also returned by match()

• RLENGTH - the length of the string in characters• If no match found, RSTART is set to 0 and RLENGTH

is set to -1

Page 28: 13 More Advanced Awk Mauro Jaskelioff (originally by Gail Hopkins)

System Variables that are Arrays

There are two system variables that are arrays:

1. ARGV– An array containing the command line

arguments given to awk. – The number of elements is stored in another

variable called ARGC (not an array)– The array is indexed from 0 (unlike other

arrays in awk)– The last element is therefore ARGC-1– E.g. ARGV[ARGC-1], ARGV[2]– The first element is the name of the

command that invoked the script

Page 29: 13 More Advanced Awk Mauro Jaskelioff (originally by Gail Hopkins)

System Variables that are Arrays (2)

2. ENVIRON– An array containing environment

variables– Each element is the value of the

current environment– The index of each element is the name

of the environment variable– E.g. ENVIRON["PATH"],

ENVIRON["SHELL"]

Page 30: 13 More Advanced Awk Mauro Jaskelioff (originally by Gail Hopkins)

ARGV ExampleBEGIN {

for (x=0; x<ARGC; x++)

print ARGV[x]

print ARGC

}

$ awk -f parameters.awk 2007 G51UST "Mauro Jaskelioff" students=80 -

awk

2007

G51UST

Mauro Jaskelioff

Students=15

-

6

Page 31: 13 More Advanced Awk Mauro Jaskelioff (originally by Gail Hopkins)

The system() Function

• The system() function allows a programmer to execute a command whilst within an awk script.

• The awk script waits for the command to finish before continuing execution

• The output of the command is NOT available for processing from within awk

• The system() function returns an exit status which can be tested by the awk script

Page 32: 13 More Advanced Awk Mauro Jaskelioff (originally by Gail Hopkins)

An example using system()

BEGIN {

if (system("mkdir UST") == 0)

{

if (system("cd UST") != 0)

print "change directory - failed"

}

else

print "make directory - failed"

}

This example tries to create a new directory called UST. If successful, the code tries to change directory to UST. If not, an error is printed.

Page 33: 13 More Advanced Awk Mauro Jaskelioff (originally by Gail Hopkins)

An example using system()

$ awk -f create.awk

$ ls UST

$ awk -f create.awk

mkdir: UST: File exists

make directory - failed

Here, the script (called create.awk) is run and is successful. “ls UST” doesn’t return anything because UST is empty.

Here, the script is run for a second time and so the mkdir command fails because UST already exists. The first error is given by the mkdir command, the second error is given by the awk script

Page 34: 13 More Advanced Awk Mauro Jaskelioff (originally by Gail Hopkins)

Use of Backslash

• Backslash can be used:– To continue strings across new lines

$ awk ‘BEGIN {print "hello, \

> world" }’

hello, world

Page 35: 13 More Advanced Awk Mauro Jaskelioff (originally by Gail Hopkins)

Use of Backslash (2)

– For escape sequences• \b - backspace• \n - new line• \r - carriage return• \t - horizontal tab• \v - vertical tab• \c - any literal character:

$ awk 'BEGIN {print "80\% \"topsy turvy\", 20\% strange" }'

80% ″topsy turvy″, 20% strange

Page 36: 13 More Advanced Awk Mauro Jaskelioff (originally by Gail Hopkins)

Forcing Variable Types• In awk, you do not declare variables and

given them types• Sometimes you want to force awk to

treat a variable as a particular type, e.g. as a number or as a string.– To force a variable, x, to be treated as a

number, put in the line:•x=x+0

– To force a variable, x, to be treated as a string, put in the line:•x=x ""

Page 37: 13 More Advanced Awk Mauro Jaskelioff (originally by Gail Hopkins)

Using sed and awk Together - An Example

• In this example, sed is used to remove empty lines and lines containing quotes before passing the data onto awk:

#!/bin/sh

/bin/sed -e ′/^$/d′ -e′/^#.*/d′

| awk <do something>

Page 38: 13 More Advanced Awk Mauro Jaskelioff (originally by Gail Hopkins)

Summary

• More advanced awk• awk’s programming model• Next and getline• Input/output to/from files and pipes• Formatted printing• Built in functions• ARGV and ARGC• Forcing variable types