scientific programming in c xiii. shell programming - course pages
TRANSCRIPT
Scientific Programming in CXIII. Shell programming
Susi Lehtola
11 December 2012
Introduction
Often in scientific computing one needs to do simple tasks relatedto
I renaming of files
I file conversions
I unit conversions
These are often best done using shell programming and commandline tools.
Scientific Programming in C, fall 2012 Susi Lehtola Shell programming 2/22
sed
sed performs text filtering and transformation. Examples:
Replace ”foo” with ”bar” in file:
$ sed ” s | f o o | bar | g” f i l e > f i l e . new
Delete the first 10 lines of a file
$ sed ’1 ,10 d ’ f i l e > f i l e . new
Delete the last line of a file
$ sed ’ $d ’ f i l e > f i l e . new
In-place modification of file with -i argument (no output tostdout).
Scientific Programming in C, fall 2012 Susi Lehtola Shell programming 3/22
awk
awk is a language for processing text files.
The input is read line by line, and it is split into fields (i.e., words).
Awk programs are written as a series of pattern action pairs
c o n d i t i o n { a c t i o n }
that are run at every line of input.
Scientific Programming in C, fall 2012 Susi Lehtola Shell programming 4/22
awk
awk is a language for processing text files.
The input is read line by line, and it is split into fields (i.e., words).
Awk programs are written as a series of pattern action pairs
c o n d i t i o n { a c t i o n }
that are run at every line of input.
Scientific Programming in C, fall 2012 Susi Lehtola Shell programming 5/22
awk, cont’d
Hello world in awk
$ awk ’ BEGIN { p r i n t ” H e l l o w or l d ! ”} ’H e l l o w or l d !
Scientific Programming in C, fall 2012 Susi Lehtola Shell programming 6/22
awk, cont’d
There are also special BEGIN and END blocks that are run onlyonce at the startand the end of the program, respectively.
Full program:
BEGIN {/∗ code t h a t i s run a t t h e s t a r t ∗/
}
{/∗ code t h a t i s run f o r e v e r y l i n e o f i n p u t ∗/
}
END {/∗ code t h a t i s run a t t h e end ∗/
}
Scientific Programming in C, fall 2012 Susi Lehtola Shell programming 7/22
awk, cont’d
Example: an xyz file
186−m o l e c u l e water c l u s t e rO 0.000000 0.000000 0.000000H −0.410000 −0.740000 0.530000H 0.640000 0.510000 0.580000O −1.030000 −1.980000 1.280000H −1.220000 −2.940000 1.490000H −1.360000 −1.400000 2.030000O 0.000000 0.270000 −2.800000H −0.100000 0.120000 −1.810000H −0.860000 0.630000 −3.160000O 1.670000 1.030000 1.820000H 2.160000 1.870000 1.610000H 2.290000 0.380000 2.270000O −0.550000 4.130000 0.440000H −1.470000 3.850000 0.150000H −0.240000 4.900000 −0.120000O −0.580000 2.340000 3.290000H −0.750000 3.170000 2.750000H 0.320000 1.970000 3.070000
Scientific Programming in C, fall 2012 Susi Lehtola Shell programming 8/22
awk, cont’dDecompose the file#!/ u s r / b i n /awk −f{
p r i n t f (” L i n e %i : \”%s \”\n ” ,NR, $0 ) ;f o r ( i =1; i<=NF ; i ++) {
p r i n t f (”\ tWord %i : \”%s \” .\ n ” , i , $ i ) ;}
}
Running gives$ c a t c l u s t e r . xyz | . / decompose . awkL i n e 1 : ”18”
Word 1 : ” 1 8 ” .L i n e 2 : ” Water c l u s t e r , f i r s t 6 m o l e c u l e s . ”
Word 1 : ” Water ” .Word 2 : ” c l u s t e r , ” .Word 3 : ” f i r s t ” .Word 4 : ” 6 ” .Word 5 : ” m o l e c u l e s . ” .
L i n e 3 : ”O 0.000000 0.000000 0.000000”Word 1 : ”O” .Word 2 : ” 0 . 0 0 0 0 0 0 ” .Word 3 : ” 0 . 0 0 0 0 0 0 ” .Word 4 : ” 0 . 0 0 0 0 0 0 ” .
Scientific Programming in C, fall 2012 Susi Lehtola Shell programming 9/22
awk, cont’d
I $0 contains the whole input line
I $1 is the first word on the line
I $2 is the second word on the line
I . . .
I $NF is the last word on the line
Useful special variables in awk:
I NR is the current line number
I NF contains the number of fields on the current line
I $NR is the amount of lines in the file (line number of last line)
Scientific Programming in C, fall 2012 Susi Lehtola Shell programming 10/22
awk, cont’d
Extract the x coordinates from the file$ c a t c l u s t e r . xyz | awk ’{ i f (NR>2) { p r i n t $2 }} ’0 .000000−0.4100000.640000−1.030000−1.220000−1.3600000.000000−0.100000−0.8600001.6700002.1600002.290000−0.550000−1.470000−0.240000−0.580000−0.7500000.320000
Scientific Programming in C, fall 2012 Susi Lehtola Shell programming 11/22
awk, cont’dFind out the maximum and minimum coordinates#!/ u s r / b i n /awk −fBEGIN {
max [0]= max [1]= max[2]=−1 e10 ;min [0]= min [1]= min [2]=1 e10 ;
}
{i f (NR>2) {
f o r ( i =0; i <3; i ++) {i f ( $ ( i +2)<min [ i ] ) {
min [ i ]=$ ( i +2)} ;i f ( $ ( i +2)>max [ i ] ) {
max [ i ]=$ ( i +2)}
}}
}
END {f o r ( i =0; i <3; i ++) {
p r i n t f (”% e . . . % e\n ” , min [ i ] , max [ i ] )}
}Scientific Programming in C, fall 2012 Susi Lehtola Shell programming 12/22
awk, cont’d
Running gives
$ c a t c l u s t e r . xyz | . / minmax . awk−1.470000 e+00 . . . 2 .290000 e+00−2.940000 e+00 . . . 4 .900000 e+00−3.160000 e+00 . . . 3 .290000 e+00
Scientific Programming in C, fall 2012 Susi Lehtola Shell programming 13/22
Bash
Bash (Bourne-Again SHell) is the default shell on linux systems,and it has quite nice scripting features.
For example:
$ f o r i i n f o o bar ; do echo $ i ; donef o obar$ f o r ( ( i =0; i <10; i ++)); do echo ”The v a l u e o f i i s $ i . ” ; doneThe v a l u e o f i i s 0 .The v a l u e o f i i s 1 .The v a l u e o f i i s 2 .The v a l u e o f i i s 3 .The v a l u e o f i i s 4 .The v a l u e o f i i s 5 .The v a l u e o f i i s 6 .The v a l u e o f i i s 7 .The v a l u e o f i i s 8 .The v a l u e o f i i s 9 .
Scientific Programming in C, fall 2012 Susi Lehtola Shell programming 14/22
Bash, cont’d
You can also loop over files:
$ f o r i i n ∗ . t e x ; do cp −a $ i $ i . o r i g ; done
This will make backups of all the .tex files in the current directory.(*.tex is expanded to match all the .tex files in the directory, afterwhich the for loop runs over the expansion)
Advanced version:$ f o r i i n ∗ . t e x ; do
# Get m o d i f i e d d a t es u f f i x=$ ( d a t e −−r e f e r e n c e=$ i +%Y%m%d.%H%M. bak )
# and backup t h e f i l ecp −av $ i ${ i }−${ s u f f i x }
done
This will suffix the backup with the time stamp of the original file.
Scientific Programming in C, fall 2012 Susi Lehtola Shell programming 15/22
Bash, cont’d
You can also loop over files:
$ f o r i i n ∗ . t e x ; do cp −a $ i $ i . o r i g ; done
This will make backups of all the .tex files in the current directory.(*.tex is expanded to match all the .tex files in the directory, afterwhich the for loop runs over the expansion)
Advanced version:$ f o r i i n ∗ . t e x ; do
# Get m o d i f i e d d a t es u f f i x=$ ( d a t e −−r e f e r e n c e=$ i +%Y%m%d.%H%M. bak )
# and backup t h e f i l ecp −av $ i ${ i }−${ s u f f i x }
done
This will suffix the backup with the time stamp of the original file.
Scientific Programming in C, fall 2012 Susi Lehtola Shell programming 16/22
Bash, cont’d
Let’s say you have a bunch of files with names file1, file2, . . . ,file199, file200, . . . , file1098, file1099.
You want to rename these to file0001, file0002, . . . , file1099.How do you do this?
Scientific Programming in C, fall 2012 Susi Lehtola Shell programming 17/22
Bash, cont’d
Solution: bash and awk.$ f o r ( ( i =1; i <=1099; i ++)); do
# Conver t number to c o n t a i n l e a d i n g z e r o sn=‘ echo $ i | awk ’{ p r i n t f (”%04 i ” , $1 )} ’ ‘
# Has f i l e name changed ?i f [ [ ” $ i ” != ”$n” ] ] ; thenmv f i l e $ { i } f i l e $ {n}
f idone
Scientific Programming in C, fall 2012 Susi Lehtola Shell programming 18/22
Bash, cont’d
Bash also has arrays.
$ a r r =( This i s an a r r a y . )$ echo $ a r rTh i s$ echo ${ a r r [ 0 ] }This$ echo ${ a r r [ 1 ] }i s$ echo ${ a r r [ 2 ] }an$ echo ${ a r r [ 3 ] }a r r a y .$ echo ${ a r r [ @]}This i s an a r r a y .$ f o r ( ( i =0; i<${#a r r [ @ ] } ; i ++)); do
echo ” Element $ i : \”${ a r r [ i ]}\” . ”done
Element 0 : ” Thi s ” .Element 1 : ” i s ” .Element 2 : ”an ” .Element 3 : ” a r r a y . ” .
Scientific Programming in C, fall 2012 Susi Lehtola Shell programming 19/22
Bash, cont’d
You can also loop directly over the elements in the array as
$ f o r i i n ${ a r r [ @ ] } ; do echo $ i ; doneThisi sana r r a y .
since ${arr[@]} expands to the full array.
Scientific Programming in C, fall 2012 Susi Lehtola Shell programming 20/22
Bash, cont’d
For example, in conventional quantum chemistry one often needsto check the convergence with regard to the basis set. This can benicely automatized with bash arrays
$ b a s i s =({ , aug−}cc−pV{D, T,Q, 5 , 6}Z)$ f o r i i n ${ b a s i s [ @ ] } ; do echo $ i ; donecc−pVDZcc−pVTZcc−pVQZcc−pV5Zcc−pV6Zaug−cc−pVDZaug−cc−pVTZaug−cc−pVQZaug−cc−pV5Zaug−cc−pV6Z
Scientific Programming in C, fall 2012 Susi Lehtola Shell programming 21/22
Bash, cont’d
Read more on bash programming at
I http://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO.html
beginners’ guide
I http://tldp.org/LDP/abs/html/ advanced level
Scientific Programming in C, fall 2012 Susi Lehtola Shell programming 22/22