lecture 11: bash, command line - pitt.edunaraehan/ling1340/lecture11.pdf · using tab for file name...

27
Lecture 11: Bash Shell, Command Line LING 1340/2340: Data Science for Linguists Na-Rae Han

Upload: ngokien

Post on 28-Jul-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 11: Bash, command line - pitt.edunaraehan/ling1340/Lecture11.pdf · Using TAB for file name completion ... hello.py | wc > out.txt daisy chain! ... Alice's Adventures in Wonderland

Lecture 11:

Bash Shell, Command Line

LING 1340/2340: Data Science for Linguists

Na-Rae Han

Page 2: Lecture 11: Bash, command line - pitt.edunaraehan/ling1340/Lecture11.pdf · Using TAB for file name completion ... hello.py | wc > out.txt daisy chain! ... Alice's Adventures in Wonderland

Objectives

Finally, Bash shell

Running things in command line

Interacting with text files in command line

11/6/2017 2

Page 3: Lecture 11: Bash, command line - pitt.edunaraehan/ling1340/Lecture11.pdf · Using TAB for file name completion ... hello.py | wc > out.txt daisy chain! ... Alice's Adventures in Wonderland

Resources

11/6/2017 3

Learning resources section:

http://www.pitt.edu/~naraehan/ling1340/resources.html#bash

Software Carpentry, The Unix Shell:

http://swcarpentry.github.io/shell-novice/

Thirty Useful Unix Commands:

http://www.maths.manchester.ac.uk/~pjohnson/resources/unixShort/examples-commands.pdf

Ones you do not need:

compress, finger, lpr, talk

(Windows) "more" is not supported. Use "less" instead.

Page 4: Lecture 11: Bash, command line - pitt.edunaraehan/ling1340/Lecture11.pdf · Using TAB for file name completion ... hello.py | wc > out.txt daisy chain! ... Alice's Adventures in Wonderland

Shell introduction, navigating

11/6/2017 4

Introducing the shell http://swcarpentry.github.io/shell-novice/01-intro/

Navigating & working with files and directories http://swcarpentry.github.io/shell-novice/02-filedir/ http://swcarpentry.github.io/shell-novice/03-create/

We've been doing some of these already, as part of our git routine. You should know: . .. ~ pwd cd ls Command-line history with and Using TAB for file name completion Using Control+C to quit

Page 5: Lecture 11: Bash, command line - pitt.edunaraehan/ling1340/Lecture11.pdf · Using TAB for file name completion ... hello.py | wc > out.txt daisy chain! ... Alice's Adventures in Wonderland

Settling in, customizing

11/6/2017 5

You can customize your shell.

.bashrc

.bash_profile

These files store your customization.

In your home directory:

your_editor .bash_profile &

After adding entries or editing, you should either log back in, or execute source .bash_profile.

Aliasing is the most common customization method:

alias calc='/c/windows/system32/calc.exe'

alias ls='ls -hF --color=tty'

Your favorite shortcuts and command-line options

Mac users: color option is not supported. Remove

the color part.

Page 6: Lecture 11: Bash, command line - pitt.edunaraehan/ling1340/Lecture11.pdf · Using TAB for file name completion ... hello.py | wc > out.txt daisy chain! ... Alice's Adventures in Wonderland

PATH, which, where

11/6/2017 6

We have been occasionally using pip to install Python libraries. Where is this pip? Which pip are you using?

Page 7: Lecture 11: Bash, command line - pitt.edunaraehan/ling1340/Lecture11.pdf · Using TAB for file name completion ... hello.py | wc > out.txt daisy chain! ... Alice's Adventures in Wonderland

PATH, which, where

11/6/2017 7

If you want to install tweepy for this version of python, you can do:(1) pip3 install tweepy

(2) /c/Program\ Files.../Scripts/pip install tweepy

(3) cd into /c/Program Files.../Scripts directory and then ./pip install tweepy

Page 8: Lecture 11: Bash, command line - pitt.edunaraehan/ling1340/Lecture11.pdf · Using TAB for file name completion ... hello.py | wc > out.txt daisy chain! ... Alice's Adventures in Wonderland

Windows users

11/6/2017 8

Because git-bash is not a native command-line shell for Windows (cmd is), there are a few additional wrinkles.

Certain programs are designed to run within a console window. Those need to be prefixed with winpty if you want Python shell.

winpty python

Pay attention to your directory path.

In git-bash, full path starts with /c/.

In cmd (Windows native), it is C:\...

In Python, full path can be written as 'C:/...' or 'C:\\...' or r'C:\...'.

Not found:

nano (need to download and do some setting up. See next page.)

more (use less)

man (Google stuff up)

Page 9: Lecture 11: Bash, command line - pitt.edunaraehan/ling1340/Lecture11.pdf · Using TAB for file name completion ... hello.py | wc > out.txt daisy chain! ... Alice's Adventures in Wonderland

Setting up nano 2.7.5 to use in Git Bash

on Windows

11/6/2017 9

1. Download nano-git-0d9a7347243.exe from:

https://www.nano-editor.org/dist/win32-support/

2. Rename it nano-git.exe

3. Place it under /usr/bin/.

This is likely C:\Program Files\Git\usr\bin if you installed git with default setting.

4. Create a script file called nano (without file extension!) in the same /usr/bin/ directory. Put two lines: #!/bin/sh

START //WAIT nano-git.exe "$@"

5. You can now launch nano editor by simply calling 'nano' or 'nano foo.txt'.

Page 10: Lecture 11: Bash, command line - pitt.edunaraehan/ling1340/Lecture11.pdf · Using TAB for file name completion ... hello.py | wc > out.txt daisy chain! ... Alice's Adventures in Wonderland

Mac users

11/6/2017 10

Practice nano.

Add some aliases.

Like in Windows, you should be able to launch any app that is found in your PATH.

Surprise! You also have a handy command for launching anyGUI application from command-line.

open -a Application-Name

http://osxdaily.com/2007/02/01/how-to-launch-gui-applications-from-the-terminal/

Page 11: Lecture 11: Bash, command line - pitt.edunaraehan/ling1340/Lecture11.pdf · Using TAB for file name completion ... hello.py | wc > out.txt daisy chain! ... Alice's Adventures in Wonderland

Running python script from command-

line

11/6/2017 11

1. python hello.py

Assuming python is in your $PATH, and hello.py is in your current working directory

2. hello.py

Assuming your current working directory is in your $PATH. If not, you should execute ./hello.py

Assuming your script begins with a line (called 'shebang' line):

#!/system_path/to/python

In my case, it's #!/c/ProgramData/Anaconda3/python

If your path contains a SPACE... tough luck!

Page 12: Lecture 11: Bash, command line - pitt.edunaraehan/ling1340/Lecture11.pdf · Using TAB for file name completion ... hello.py | wc > out.txt daisy chain! ... Alice's Adventures in Wonderland

Piping and I/O redirections

11/6/2017 12

Piping is what makes command-line ever so powerful.

For people working mainly with text data (us!), piping enables us to manipulate data on the fly.

hello.py > out.txt redirect output to file

hello.py | wc redirect output to another application

hello.py | wc > out.txt daisy chain!

Also:

< read in from a file input

>> append to existing file rather than overwriting

Page 13: Lecture 11: Bash, command line - pitt.edunaraehan/ling1340/Lecture11.pdf · Using TAB for file name completion ... hello.py | wc > out.txt daisy chain! ... Alice's Adventures in Wonderland

Download two files

11/6/2017 13

Alice's Adventures in Wonderland

http://www.gutenberg.org/ebooks/11

Download the Plain Text UTF-8 version.

Rename the file to alice.txt

ENABLE word list from Peter Norvig's site:

http://norvig.com/ngrams/

Download enable1.txt

Save them onto your Desktop.

Then, within bash shell, move the files into your Data_Sciencedirectory. (Wait if you are not sure how this is done.)

Page 14: Lecture 11: Bash, command line - pitt.edunaraehan/ling1340/Lecture11.pdf · Using TAB for file name completion ... hello.py | wc > out.txt daisy chain! ... Alice's Adventures in Wonderland

Files in your Data_Science directory

11/6/2017 14

Page 15: Lecture 11: Bash, command line - pitt.edunaraehan/ling1340/Lecture11.pdf · Using TAB for file name completion ... hello.py | wc > out.txt daisy chain! ... Alice's Adventures in Wonderland

Examining a text file

11/6/2017 15

ls –la

Displays file info

wc

Displays line count,

word count, and character count

head -n

Displays initial n lines

tail –n

Displays last n lines

Page 16: Lecture 11: Bash, command line - pitt.edunaraehan/ling1340/Lecture11.pdf · Using TAB for file name completion ... hello.py | wc > out.txt daisy chain! ... Alice's Adventures in Wonderland

grep

11/6/2017 16

grep

Searches each line in text for regular expression match

grep –P

Accepts perl-style regular expressions

Perl-style == Python-style

Words with 5+ consecutive "vowel"s

Page 17: Lecture 11: Bash, command line - pitt.edunaraehan/ling1340/Lecture11.pdf · Using TAB for file name completion ... hello.py | wc > out.txt daisy chain! ... Alice's Adventures in Wonderland

Colorizing grep output

11/6/2017 17

You might want to colorize your grep output.

I have grep aliased to use color in my .bashrc file. (Put your alias in .bash_profile)

Page 18: Lecture 11: Bash, command line - pitt.edunaraehan/ling1340/Lecture11.pdf · Using TAB for file name completion ... hello.py | wc > out.txt daisy chain! ... Alice's Adventures in Wonderland

grep -i, -v

11/6/2017 18

grep –i

ignores case

grep –v

prints lines that DO NOT match

Page 19: Lecture 11: Bash, command line - pitt.edunaraehan/ling1340/Lecture11.pdf · Using TAB for file name completion ... hello.py | wc > out.txt daisy chain! ... Alice's Adventures in Wonderland

grep and piping together

11/6/2017 19

Pipe into wc –l to count

Write out to a file

Take a look at the last 5 lines of file

Append new search result to file

Take a look at the last 5 lines of file

File is now longer

Page 20: Lecture 11: Bash, command line - pitt.edunaraehan/ling1340/Lecture11.pdf · Using TAB for file name completion ... hello.py | wc > out.txt daisy chain! ... Alice's Adventures in Wonderland

grep -n, -l

11/6/2017 20

grep –n

prints out line number

grep –l

prints file names only if match is found

Page 21: Lecture 11: Bash, command line - pitt.edunaraehan/ling1340/Lecture11.pdf · Using TAB for file name completion ... hello.py | wc > out.txt daisy chain! ... Alice's Adventures in Wonderland

Regular expressions are powerful

11/6/2017 21

Learn regex:

https://www.regular-expressions.info/tutorial.html

Palindromes with limited lengths can be captured with regex.

\1, \2, \3 are backreferences and match portions in ()

Page 22: Lecture 11: Bash, command line - pitt.edunaraehan/ling1340/Lecture11.pdf · Using TAB for file name completion ... hello.py | wc > out.txt daisy chain! ... Alice's Adventures in Wonderland

sort + uniq +

count

11/6/2017 22

sort

sorts lines alphabetically

uniq –c

folds identical lines, adding count in front

wc -l

counts lines

cut -d, -f3

cuts 3rd column with ',' as delimiter

Page 23: Lecture 11: Bash, command line - pitt.edunaraehan/ling1340/Lecture11.pdf · Using TAB for file name completion ... hello.py | wc > out.txt daisy chain! ... Alice's Adventures in Wonderland

Piping gone mad

11/6/2017 23

Tokenization + frequency counting on

the fly. Tools used:headtailperlgrepsortuniq

Page 24: Lecture 11: Bash, command line - pitt.edunaraehan/ling1340/Lecture11.pdf · Using TAB for file name completion ... hello.py | wc > out.txt daisy chain! ... Alice's Adventures in Wonderland

Text processing on the command line

11/6/2017 24

What you just witnessed is command-line text processing wizardry.

Computational linguists have long been doing this.

Text documents these days tend to have more structure (XML, HTML, etc.) so these tools are less useful, but nevertheless still very handy to know.

Ken Church, Unix for Poets, original document:

https://www.cs.upc.edu/~padro/Unixforpoets.pdf

Christopher Manning (Stanford) brings it up-to-date:

https://web.stanford.edu/class/linguist278/notes/278-UnixForPoets.pdf

Page 25: Lecture 11: Bash, command line - pitt.edunaraehan/ling1340/Lecture11.pdf · Using TAB for file name completion ... hello.py | wc > out.txt daisy chain! ... Alice's Adventures in Wonderland

more or less

11/6/2017 25

more (and less) through a text file content, one screen-full at a time. Press SPACE for next page, q to quit.

Windows users: only less is available on git bash.

Often, you pipe your STANDARD OUTPUT into more,

so you can look through the result, e.g.,

grep 'q' words | more

SPACE for next pageq to quit

Page 26: Lecture 11: Bash, command line - pitt.edunaraehan/ling1340/Lecture11.pdf · Using TAB for file name completion ... hello.py | wc > out.txt daisy chain! ... Alice's Adventures in Wonderland

cat

11/6/2017 26

cat concatenates text file content and prints on the standard output.

Often used as the first step of piping.

Also useful in concatenating multiple file contents.

Page 27: Lecture 11: Bash, command line - pitt.edunaraehan/ling1340/Lecture11.pdf · Using TAB for file name completion ... hello.py | wc > out.txt daisy chain! ... Alice's Adventures in Wonderland

Wrapping up

11/6/2017 27

To-Do 10

Visit 2 classmates' projects and add your feedback and impression to their Visitor's Log.

Practice Unix tools and bash shell!

Work on your project!