cis 191: linux and unix class 4 february 18 th, 2015

72
CIS 191: Linux and Unix Class 4 February 18 th , 2015

Upload: neal-wells

Post on 25-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CIS 191: Linux and Unix Class 4 February 18 th, 2015

CIS 191: Linux and Unix

Class 4February 18th, 2015

Page 2: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Outline

Scheduled Jobs

Language Theory Overview

Grep Regular Expressions

Examples of Grep Regular Expressions

Sed

Page 3: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Recall Daemons

• Daemons are just background processes• They’re typically used to provide services that must

always run– sshd = ssh server process– udevd = hardware daemon– syslogd = logging daemon– (Names end in “d” by convention)

Page 4: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Running tasks periodically

• Sometimes you want to run a program on a schedule– Generally for administrative tasks

• “Water my lawn at noon every other day in the summer”

• In UNIX systems, your tasks might be more along the lines of clearing an error log or taking a backup of your thesis

Page 5: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Cron

• The cron daemon handles periodic tasks– So if you want a task to run periodically, you must submit jobs to

the cron daemon….– It wakes up once a minute and services whatever tasks it has

been assigned (after checking to make sure a particular task should run in that minute, of course)

• Many system tasks use cron• logging• backups• updates

Page 6: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Using cron

• Don’t use cron• From the man page:

– “The cron utility is launched by launchd when it sees the existence of /etc/crontab or files in /usr/lib/cron/tabs. There should be no need to start it manually.”

• cron will start by itself when there is a crontab file which specifies jobs that should be run!– So, if you want to interact with cron, you need to edit either the

system crontab file or your personal crontab file.

Page 7: CIS 191: Linux and Unix Class 4 February 18 th, 2015

crontabs

• In general, you don’t want to edit the system crontab– Unless you really know what you’re doing!

• If you want to edit your user-specific crontab, access it by running– $ crontab –e– This will launch your favorite editor, as specified in the shell’s

EDITOR environment variable.• export EDITOR=vim

– set vim as your default EDITOR for command line programs

• This should just work in Ubuntu, as with many things.– As for OSX…

Page 8: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Editing crontabs with vim (OSX edition)

• OSX has its own ideas about editing crontabs!– It requires that crontab files be edited in-place– nano does this (but I don’t know what buttons to press...)

• Making vim work with crontab –e in OSX– In .bashrc (set editor to be vim and set a variable in vim session)

• export EDITOR=vim• export VISUAL=vim• alias crontab=“env VIM_CRONTAB=true crontab”

– In .vimrc (if vim session has variable as true, then change mode)if $VIM_CRONTAB == “true”

set nobackupset nowritebackup

endif

Page 9: CIS 191: Linux and Unix Class 4 February 18 th, 2015

crontab format

• Each line in a crontab file represents some task you would like the cron daemon to execute

• MIN HR DOM MN DOW CMD <args>– MIN – minute of the hour– HR – hour of the day– DOM – day of the month– MN – month of the year– DOW – day of the week (0=Sunday,1=Monday,2=Tuesday,…)– CMD – your command– <args> -- whatever arguments you’d like to pass to your

command

Page 10: CIS 191: Linux and Unix Class 4 February 18 th, 2015

crontab syntax

• Again, recall the * wildcard (yes, it is important)– 30 08 10 06 * /path-to/my_sweet_backup.sh /

• Run backup at 8:30 on June 10th, regardless of what weekday it is– 00 11,16,21 * * * /path-to/my_spam_script.sh

• Spam peeps at 11:00AM, 4:00PM, and 9:00PM every day– 00 09-18 * * 1-5 /path-to/check_db_status.sh

• Check database status every hour from 9AM to 6PM, M-F– * * * * * /path-to/tick.sh

• Run a ticker script every minute (uses every available time frame to cron, and the cron daemon wakes up every minute)

– */10 * * * * /path-to/ten_minutes.sh• Runs ten_minutes.sh every ten minutes every day

• See this awesome link for more information!

Page 11: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Scheduling a one-off task with at

• Suppose you want to manage submissions right before and right after lecture, but you’re lecturing at that time.

• The at utility is perfect for this– Allows you to specify a time, and then enter commands to run

at the time you’ve specified

• Perfect for, say, disabling hw2 submission at 1:30, enabling hw2-late submission at 1:30, and enabling hw3 submission at 3:00.

Page 12: CIS 191: Linux and Unix Class 4 February 18 th, 2015

at syntax

at 1:30PM sep 17>project –d hw2>project –e hw2-late>^Djob 2 at Wed Sep 17 13:30:00 2014

Page 13: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Viewing and Manipulating the atq

• at –q (or atq) will list the jobs you’ve submitted to at which have not been completed yet

• at –c <jobnumber> will cat the job text to the terminal for you to view

• at –r <jobnumber> will remove a job from the atq

Page 14: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Running at

• In Ubuntu, you’ll probably need to install at– sudo apt-get install at– It should just work after this…

• In OSX, at relies on the atrun daemon to manage its jobs– See man atrun for how to enable this daemon

Page 15: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Outline

Scheduled Jobs

Language Theory Overview

Grep Regular Expressions

Examples of Grep Regular Expressions

Sed

Page 16: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Languages

• A set of strings• These strings form an “alphabet”• The language is “decided” by some process which

decides if a string is in the language or not

Page 17: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Regular Languages

• A regular language is a set that can be decided by viewing a single character at time, using a fixed amount of memory!– Specifically, regular languages are languages that can be decided

by a DFA (deterministic finite automaton); you’ll learn more about this in CIS 262 if you haven’t taken it already.

• It doesn’t matter how long the string is!

Page 18: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Regular Expressions

• A regular expression exactly describes a regular language– That is, every regular language can be described by some

regular expression– And a regular expression describes a regular language

Page 19: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Regular Expressions Illustrated

• Suppose A and B are regular languages.

Page 20: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Regular Extensions

• A few extensions to classical regular expressions that stay within regular langauges– If A is an RE, then A+ matches one or more copies of A– If A is an RE, then A? matches or or no copies of A

Page 21: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Truly Regular Expressions

• abc matches only the string “abc”• (ab)* matches the empty string “”, “ab”, “abab”, …• (a|b)+ matches any string containing some number of

‘a’s and ‘b’s• (a*b)+ matches any string that has any number of ‘a’s

followed by a single ‘b’, at least once– In other words, any string of ‘a’s and ‘b’s which ends in a ‘b’.

• a(b|c)*a matches any string which starts and ends with an ‘a’ and has only ‘b’s and ‘c’s in between.

Page 22: CIS 191: Linux and Unix Class 4 February 18 th, 2015

More Regular Expression Extensions

• There are a number of extensions that allow for more concise representation– . (dot) matches any single character (any character at all)– [cde] matches any single character (here: c, d, and e) listed

between the square brackets– [h-l] matches any character in the range of characters from h-l

• To match any character not in the list, place a caret (^) first inside the brackets.– [^0-9] matches anything that is not a digit.

– If A is a RE, then A{n,m} matches anywhere between m and n copies of A, inclusive.

– A{n} matches exactly n copies of A.

• On this slide, .,[, ], {, and }, are metacharacters.

Page 23: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Metacharacters

• A certain number of predefined shortcuts (character classes) are provided.– [[:space:]], or ‘\s’, matches any whitespace character.– [[:alnum:]], or ‘\w’, matches any “word” character

• By which we mean letters and numbers, though some implementations include underscores (_)

– [[:digit:]], ‘\d’, matches any digit (0-9)– ^ matches “beginning-of-line”– $ matches “end-of-line”

Page 24: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Metacharacters

• \\ matches backslash (\)– Since \ is normally used to specify other metacharacters

• \* matches an asterisk– Since * usually matches anything…

• \< and \> matches word boundaries• \. matches a dot• Metacharacters need to be preceeded by a backslash in

order to match the literal character

Page 25: CIS 191: Linux and Unix Class 4 February 18 th, 2015

“Regular” Expressions: a Misnomer

• Just about any name but “regular” would have been better!– Many extensions describe non-regular languages– The syntax and behavior is different for just about every system

involving regular expressions!– What needs escaping changes based on implementation

• In fact, Vim has four different settings for this.– See “:help magic”

– The way we describe or apply regular expressions and gather the matches differs across settings.

Page 26: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Our focus: grep and sed

• As we’ve discussed, grep applies a regular expression to each line in input file or files

• sed is a stream editor– More on this soon…

Page 27: CIS 191: Linux and Unix Class 4 February 18 th, 2015

New Skill

xkcd.com/208

Page 28: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Outline

Scheduled Jobs

Language Theory Overview

Grep Regular Expressions

Examples of Grep Regular Expressions

Sed

Page 29: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Motivating Examples

• We’re usually searching for a particular kind of text– An integer, maybe with a minus sign in front– A decimal number (for example 2.718)– A first name followed by a last name

• Or maybe a last, first– An email addres– Sentences beginning with the word “The”, ending with

punctuation.– A phone number– Prime numbers

• This really does exist, but it relies on backreferences and is rather inefficient…

Page 30: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Integers and Decimals

• Integers start with an optional -, followed by one or more digits. The perfect regular expression is therefore…

Page 31: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Integers and Decimals

• Integers start with an optional -, followed by one or more digits. The perfect regular expression is therefore…– -?[[:digit:]]+– -?\d+

Page 32: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Integers and Decimals

• Integers start with an optional -, followed by one or more digits. The perfect regular expression is therefore…– -?[[:digit:]]+– -?\d+

• How about decimals? First, we need a characterization.– There is an optional minus sign, then an optional string of digits,

followed by a ., then a string of digits.

Page 33: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Integers and Decimals

• Integers start with an optional -, followed by one or more digits. The perfect regular expression is therefore…– -?[[:digit:]]+– -?\d+

• How about decimals? First, we need a characterization.– There is an optional minus sign, then an optional string of digits,

followed by a ., then a string of digits.– -?[[:digit:]]*\.[[:digit:]]+– -?\d*\.\d+

Page 34: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Names

• Let’s begin with a characterization.

Page 35: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Names

• Let’s begin with a characterization of First Name Last Name format.– A capital letter, followed by any number of letters, then a space,

then another capital followed by any number of letters

• Now, let’s come up with the regular expression

Page 36: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Names

• Let’s begin with a characterization of First Name Last Name format.– A capital letter, followed by any number of letters, then a space,

then another capital followed by any number of letters

• Now, let’s come up with the regular expression– [A-Z]\w*\s[A-Z]\w*

Page 37: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Names

• Let’s begin with a characterization of First Name Last Name format.– A capital letter, followed by any number of letters, then a space,

then another capital followed by any number of letters

• Now, let’s come up with the regular expression– [A-Z]\w*\s[A-Z]\w*

• Do you see any potential issues with this approach?

Page 38: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Names

• Let’s begin with a characterization of First Name Last Name format.– A capital letter, followed by any number of letters, then a space,

then another capital followed by any number of letters

• Now, let’s come up with the regular expression– [A-Z]\w*\s[A-Z]\w*

• Do you see any potential issues with this approach?– What about hyphenated names? Multiple names? Middle

initials? Middle names written out?

Page 39: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Aside: Solve the Problem You Want to

• Many regular expressions will match the target– But some are easier to construct (and to understand) than

others.

• If you know a little more about the text you will be handling, you can sometimes make shortcuts– This will become more apparent when we get to replacing

(rather than just matching) text.

• Modifying the problem is a major theme throughout computer science, and in this course as well!

Page 40: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Aside #2: Evil Regular Expressions!!!

• There are two main kinds of RE engines.– NFA (Nondeterministic Finite Automaton) engines step through

the regex and may backtrack on the input text– DFA (Deterministic Finite Automaton) engines always move

forward in the string character by character– Nonbacktracking NFA engines do exist…– See http://swtch.com/~rsc/regexp/regexp1.html for more

details on the differences.

• The runtime can increase drastically for the following– Repetitions of overlapping alternations– Repetitions within repetitions– Repetitions containing both wildcards and normal characters

Page 41: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Aside #2: Some evil examples

• Can you figure out why these might be “evil”?– (x*)*– (x.)*– (x|xx)*– (x|x?)*– The prime number checker we mentioned earlier

Page 42: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Aside #2: Some evil examples

• Can you figure out why these might be “evil”?– (x*)*– (x.)*– (x|xx)*– (x|x?)*– The prime number checker we mentioned earlier

• Think about how they behave on the string– xxxxxxxxxxxxxxxxy

Page 43: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Aside #2: Some evil examples

• Can you figure out why these might be “evil”?– (x*)*– (x.)*– (x|xx)*– (x|x?)*– The prime number checker we mentioned earlier

• Think about how they behave on the string– xxxxxxxxxxxxxxxxy

• The interpretation of what is inside the parentheses changes each time the regex fails to match with the previous interpretation!

Page 44: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Outline

Scheduled Jobs

Language Theory Overview

Grep Regular Expressions

Examples of Grep Regular Expressions

Sed

Page 45: CIS 191: Linux and Unix Class 4 February 18 th, 2015

/etc/passwd

• The passwd file contains information about each user account (see man 5 passwd)– “5” is the manual section

• You can look up a particular person’s name using grep…

Page 46: CIS 191: Linux and Unix Class 4 February 18 th, 2015

/etc/passwd

$ grep ‘Spencer A’ /etc/passwdlesp:x:20657:20657:Spencer A Lee:/home1/l/lesp:/pkg/bin/bash$grep cis /etc/passwdcis511: …cis520: …precise: …

Page 47: CIS 191: Linux and Unix Class 4 February 18 th, 2015

/etc/passwd, Take Two…

• Generally, we want to use extended regular expressions (as we discussed earlier)– So when you call grep, call it with the –E flag

$ grep –E “cis[0-9]+” /etc/passwd | less

Page 48: CIS 191: Linux and Unix Class 4 February 18 th, 2015

C identifiers

• Suppose we want to find all uses of the function strfry in the directory chef

• We can use Bash expansions and grep together!

$ grep –E strfry *.cchef.c: strfry(p_str);chef.c: cond ? strfry(uuname) : uunamerecipes.c: is_strfry_ingredient(p_src)

Page 49: CIS 191: Linux and Unix Class 4 February 18 th, 2015

C Identifiers

• But grep included results that we didn’t want, such as is_strfry_ingredient

• What can we do?

Page 50: CIS 191: Linux and Unix Class 4 February 18 th, 2015

C Identifiers

• But grep included results that we didn’t want, such as is_strfry_ingredient

• What can we do?– Include word boundaries!

$ grep –E \<strfry\> *.cchef.c: strfry(p_str);chef.c: cond ? strfry(uuname) : uuname

Page 51: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Grepping for Hardware…

• Another common scenario: attempting to find a particular piece of hardware

• The lspci command will spit out a list of available PCI (Peripheral Component Interconnect) devices

$ lspci | grep –i NetworkEthernet controller: Intel 82566MM GigabitNetwork controller: Intel PRO/Wireless

Page 52: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Grepping for Hardware

• Which kernel modules are related?

$ lsmod | grep –i iwliwl4965 202721 0iwl_legacy 146875 1iwl4965mac80211 267163 2iwl4965,iwl_legacycfg80211 170485 3iwl4965,iwl_legacy,

mac80211

Page 53: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Display only the matching text

• Generally, when grep finds a match, it will display the entire line

• Most of the time this is what you want!• But when you are trying to extract a match from the text

– Like when you are looking for an address or a phone number…

• You may want to only display the match.• You can do this with the –o option

– grep –oE ‘regular expression’ file_list– displays just the matches on separate lines

Page 54: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Greedy Matching

• Let’s right a regular expression to match all instances of html tags of the form <p>, <em>, <title>…

Page 55: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Greedy Matching

• Let’s right a regular expression to match all instances of html tags of the form <p>, <em>, <title>…– <.*>

Page 56: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Greedy Matching

• Let’s right a regular expression to match all instances of html tags of the form <p>, <em>, <title>…– <.*>

• What if we run this on– <strong>Hi! I’m an example!</strong>

Page 57: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Greedy Matching

• Let’s right a regular expression to match all instances of html tags of the form <p>, <em>, <title>…– <.*>

• What if we run this on– <strong>Hi! I’m an example!</strong>

• We’ll get the following match:– <strong>Hi! I’m an example!</strong>

Page 58: CIS 191: Linux and Unix Class 4 February 18 th, 2015

What went wrong?

• Grep matches expressions greedily.• This means that it will try and match as much as it can (if

there is more to match in a line, it will do so – even if it has already found a match!)

• While there are some syntaxes (such as Perl) which allow for lazy matching, Grep’s extended regex syntax does not allow this!

• You can use perl syntax with grep –P, but we are not allowing that for assignments in this class.

Page 59: CIS 191: Linux and Unix Class 4 February 18 th, 2015

A right answer (without greed)

• <strong>Hi! I’m an example!</strong>• What if we try the following expression:

– <[^>]*>

Page 60: CIS 191: Linux and Unix Class 4 February 18 th, 2015

A right answer (without greed)

• <strong>Hi! I’m an example!</strong>• What if we try the following expression:

– <[^>]*>

• We’ll match every character that is not the close brace, followed by a close brace.

• Hallelujah! Success! We get– <strong>– </strong>

• Just as we expected.

Page 61: CIS 191: Linux and Unix Class 4 February 18 th, 2015

A right answer (without greed)

• <strong>Hi! I’m an example!</strong>• What if we try the following expression:

– <[^>]*>

• We’ll match every character that is not the close brace, followed by a close brace.

• Hallelujah! Success! We get– <strong>– </strong>

• Just as we expected.• How can we modify this to only match open tags?

Page 62: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Outline

Scheduled Jobs

Language Theory Overview

Grep Regular Expressions

Examples of Grep Regular Expressions

Sed

Page 63: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Sed Introduction

• The man page for sed describes it as “a stream editor for filtering and transforming text.”

• You should always run sed with the –r option, which allows for extended regular expressions– Noticing a pattern here?

• You also always want to give sed its regular expressions in single quotes, which tells Bash not to expand dollar signs, asterisks, question marks, and so on

Page 64: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Sed Syntax

• sed regular expressions take the syntax– s/regex/replacement/flags

• The g flag tells sed not to stop after the first replacement– Think “globally”

• Patterns can be captured in parentheses, and used in the replacement with backreferences– Sort of like storing matched information in variables…– Tell sed to store this information using extra parentheses in your

expression. Refer to them later with \1 for first group, \2 for second group…

Page 65: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Regular Expression Parenthesis Groups

• From out in first, then from left to right.• Recall the Name example from earlier

– [A-Z]\w*\s[A-Z]\w*

• If we rewrite the expression as– (([A-Z]\w*)\s([A-Z]\w*))

• Group “1” matches the full name• Group “2” matches the first name• Group “3” matches the last name

Page 66: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Sed Examples

$ echo “hello” | sed –r ‘s/lo/p/help$ echo “Here is a sentence” | sed –r ‘s/is/was/’Here was a sentence$ echo “This is a sentence” | sed –r ‘s/is/is not’This is not a sentence$ echo “This is a sentence” | sed –r ‘s/is/XXX’ThXXX is a sentence$ echo “This is a sentence” | sed –r ‘s/is/is not/g’This not is not a sentence$ echo “This is a sentence” | sed –r ‘s/\<is\>/is not/g’This is not a sentence

Page 67: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Another Sed example

• Consider translating a list of phone numbers from• (xxx)-xxx-xxxx to • xxx-xxx-xxxx• We need to replce the parenthesized part of the

numbers with its contents…• sed –r ‘s/\(([0-9]{3})\)/\1/’

– Extra parentheses tell sed to store the matched number– \1 grabs the matched text as a backreferences

Page 68: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Another Sed example

• Consider translating a list of phone numbers from• (xxx)-xxx-xxxx to • xxx-xxx-xxxx• We need to replce the parenthesized part of the

numbers with its contents…• sed –r ‘s/\(([0-9]{3})\)/\1/’

– Extra parentheses tell sed to store the matched number– \1 grabs the matched text as a backreferences

• But there’s a simpler solution…

Page 69: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Another Sed example

• Consider translating a list of phone numbers from• (xxx)-xxx-xxxx to • xxx-xxx-xxxx• We need to replce the parenthesized part of the

numbers with its contents…• sed –r ‘s/\(([0-9]{3})\)/\1/’ numbers

– Extra parentheses tell sed to store the matched number– \1 grabs the matched text as a backreferences

• But there’s a simpler solution… Remove the parentheses!– sed –r ‘s/[\(\)]//’ numbers

Page 70: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Another Example

• Consider changing a list of names from (Last, First) to (First, Last)

• As usual, we need to characterize the input first

Page 71: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Another Example

• Consider changing a list of names from (Last, First) to (First, Last)

• As usual, we need to characterize the input first– A capital letter, followed by any number of letters, then a

comma and a space; finally, one more capital letter and any number of other letters.

• And the sed expression?

Page 72: CIS 191: Linux and Unix Class 4 February 18 th, 2015

Another Example

• Consider changing a list of names from (Last, First) to (First, Last)

• As usual, we need to characterize the input first– A capital letter, followed by any number of letters, then a

comma and a space; finally, one more capital letter and any number of other letters.

• And the sed expression?– sed –r ‘s/([A-Z]\w*),\s([A-Z]\w*)/\2, \1/g’