introduction to unix manuel ruiz, bioinformatics school, campinas, sao paulo, brazil, 21-26 november...

71
Introduction to Introduction to UNIX UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

Upload: jerome-townsend

Post on 26-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

Introduction to UNIXIntroduction to UNIX

Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

Page 2: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

UNIXUNIX

UNIX is an operating system (like UNIX is an operating system (like Windows and MacOS)Windows and MacOS)

Multi-tasking: multiple processes can run Multi-tasking: multiple processes can run concurrently.concurrently.

Multi-user : different users can read Multi-user : different users can read mails, copy files, and print all at once.mails, copy files, and print all at once.

Page 3: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

Why use UNIX?Why use UNIX? - designed for lots of small programs: Shell = - designed for lots of small programs: Shell = toolboxtoolbox

- can link easily programs together : - can link easily programs together : development of automatic workflowsdevelopment of automatic workflows

- doesn’t waste computer resources on - doesn’t waste computer resources on graphicsgraphics - gives the user much more power - gives the user much more power - a lot of free bioinformatics tools are available- a lot of free bioinformatics tools are available

Unfortunately, at the expense of being user-Unfortunately, at the expense of being user-friendly.friendly.

UNIXUNIX

Page 4: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

Several UnixSeveral Unix

Two main families : Unix System Two main families : Unix System V and Unix BSDV and Unix BSD

Each company his own Unix : Sun Each company his own Unix : Sun (SunOS or Solaris), HP (HPUX), (SunOS or Solaris), HP (HPUX), IBM (AIX) …IBM (AIX) …

Since 90 : Linux and FreeBSD Since 90 : Linux and FreeBSD (=> MacOS X) (=> MacOS X)

CygwinCygwin

Page 5: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

Linux : free UnixLinux : free Unix

Linus Torvalds (Helsinki)Linus Torvalds (Helsinki) GNU project : any one can use, GNU project : any one can use,

study source code, modify source study source code, modify source code, re-distributecode, re-distribute

Widely used : world-wideWidely used : world-wide Several distributions : RedHat, Several distributions : RedHat,

CentOS, Suse, Debian, Ubuntu …CentOS, Suse, Debian, Ubuntu …

Page 6: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

The Linux SystemThe Linux System

User commands

Shell

File SystemsKernel

Device Drivers

Hardware

User commands includes executable programs and

scripts

The shell interprets user commands. It is responsible for finding the commands

and starting their execution. Several different shells are available. Bash is popular,

The kernel manages the hardware resources for the rest

of the system.

Page 7: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

ConnectionConnection

Run XmingRun Xming Run PuttyRun Putty

Page 8: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

General format of UNIX commands A UNIX command line consists of the name of a UNIX command followed by its "arguments" (options and the target filenames and/or expressions). The general syntax for a UNIX command is

    command -options targets    

ls -l /etc

Options

(flags)

ArgumentsCommand name

Page 9: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

UNIX CommandsUNIX Commands

Each word you type in the command Each word you type in the command line runs a program. So it is easy to line runs a program. So it is easy to add your own commands – just add, or add your own commands – just add, or write, another program.write, another program.

The output of the program is returned The output of the program is returned to the terminalto the terminal unless you say unless you say otherwise. So all your interaction is otherwise. So all your interaction is through one text window.through one text window.

Page 10: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

Unix FilesUnix Files Unix is CaSe SeNsiTivE!Unix is CaSe SeNsiTivE!

UNIX filenames contain only letters, numbers, and the UNIX filenames contain only letters, numbers, and the __ (underscore), (underscore), .. (dot), and (dot), and -- (dash) characters. (dash) characters. NONO ACCENTS ! ACCENTS ! NO SPACES ! Under any circumstances!NO SPACES ! Under any circumstances!

The extension (eg .txt, .fasta) can be any number of letters and is The extension (eg .txt, .fasta) can be any number of letters and is optional. It’s for your own convenience so you know what kind of optional. It’s for your own convenience so you know what kind of file is what.file is what.

You can only have one file in the same directory with the same You can only have one file in the same directory with the same name.name.

Filenames : 255 characters maximumFilenames : 255 characters maximum

Page 11: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

Special characters have different Special characters have different meanings :meanings : & ~ # ” ' { ( [ | ` \ ^ @ ) ] } $ * % ! / ; , ? & ~ # ” ' { ( [ | ` \ ^ @ ) ] } $ * % ! / ; , ?

For example :For example :• wildcard characters, the most common of wildcard characters, the most common of which is * which tells the shell to substitute any which is * which tells the shell to substitute any combination of zero or more characters that combination of zero or more characters that results in an existing filename.results in an existing filename.

Page 12: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

Working with DirectoriesWorking with Directories

Directories organize files on a Unix computer. Directories organize files on a Unix computer.

They are equivalent to folders in Windows They are equivalent to folders in Windows and Mac, except they can’t have a space in and Mac, except they can’t have a space in their name.their name.

The directory list that allows you to locate a The directory list that allows you to locate a file is called a PATH (eg., file is called a PATH (eg., /home/mruiz/text.txt is the FULL PATH to the /home/mruiz/text.txt is the FULL PATH to the file text.txt).file text.txt).

Understanding directories is vital. Understanding directories is vital.

Page 13: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

lib b in

u s r

h om e

e xpo rt e tc b in o p t

log

va r tm p

/

Typical UNIX directory Typical UNIX directory structurestructure

Page 14: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

Typical UNIX directory Typical UNIX directory structurestructure

/

/bin =where the programs live. Don’t mess

/lib =programming libraries. Ignore/etc =admin stuff. Ignore./usr =more programs, not user files. Don’t mess

/mnt =‘mount point’ for floppies, cd roms etc.If you put a cd rom in, it is in /mnt/cdrom

/tmp =temporary files. Ignore./var =more temporary files. Ignore.

/home /home/mruiz where ALL my files are/home/fred/home/jane where Jane’s files are.

I can’t see them unless she lets me.

pronounced ‘slash’ or ‘root’.

A UNIX workstation is usually set up like this, cygwin and MacOSX are different

Page 15: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

Your Home DirectoryYour Home Directory

When you log in to any UNIX computer, When you log in to any UNIX computer, you start off in your own home directoryyou start off in your own home directory

This is your home. Create sub-directories This is your home. Create sub-directories to store specific projects or groups of to store specific projects or groups of information.information.

/home/mruiz

Page 16: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

Logging InLogging In

Enter login name and password !Enter login name and password ! System password file: /etc/passwd System password file: /etc/passwd

(usually).(usually). You can change password using the You can change password using the

command: passwd.command: passwd.

Page 17: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

Some shell commandsSome shell commands

Most Important command: man (manual Most Important command: man (manual pages).pages).

Help: unix commands, C functions.Help: unix commands, C functions. Usage: man <command/function>Usage: man <command/function> Try Try ““man manman man”” ! ! Example:Example:

man ls, man passwd, man printf.man ls, man passwd, man printf.

Page 18: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

ShortcutsShortcutsThere are several shortcuts in Unix for specifying directories

. (dot) means "the working directory“ – the one you’re in. cd .

.. means "the parent directory" - the directory one level above the working directory. So cd .. will move you up (towards /) one levelcd ../.. two levels

~ (tilde) means your Home directory, so cd~ will take you home.

Page 19: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

Some shell commandsSome shell commands

pwd: what is the working directory?pwd: what is the working directory? ls: list contents of directoryls: list contents of directory mkdir <dir-name>: make directorymkdir <dir-name>: make directory rmdir <dir-name>: remove an empty rmdir <dir-name>: remove an empty

directorydirectory rm rm ––r <dir-name>: remove a directory with r <dir-name>: remove a directory with

all the contentsall the contents cd <directory>: change directory, ~/ means cd <directory>: change directory, ~/ means

your home directoryyour home directory cp <source> <target>: copy command.cp <source> <target>: copy command.

Page 20: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

Some shell commandsSome shell commands

chmod <mode> <filename>: chmod <mode> <filename>: change mode of a file/directorychange mode of a file/directory

ls ls ––l <directory or filename>: long l <directory or filename>: long list with detailslist with details

9 permission bits: d r w x r w x r w x9 permission bits: d r w x r w x r w x 3 categories: user/group/all.3 categories: user/group/all. Permissions: read/write/execute Permissions: read/write/execute

(r/w/x)(r/w/x)

Page 21: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

File PermissionsFile Permissions

Linux provides three kinds of Linux provides three kinds of permissions:permissions: Read - users with read permission may Read - users with read permission may

read the file or list the directoryread the file or list the directory Write - users with write permission may Write - users with write permission may

write to the file or new files to the write to the file or new files to the directorydirectory

Execute - users with execute permission Execute - users with execute permission may execute the file or lookup a specific may execute the file or lookup a specific file within a directoryfile within a directory

Page 22: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

File PermissionsFile Permissions

The long version of a file listing (The long version of a file listing (ls -ls -ll) will display the file permissions:) will display the file permissions:

-rwxrwxr-x 1 rvdheij rvdheij 5224 Dec 30 03:22 hello-rw-rw-r-- 1 rvdheij rvdheij 221 Dec 30 03:59 hello.c-rw-rw-r-- 1 rvdheij rvdheij 1514 Dec 30 03:59 hello.sdrwxrwxr-x 7 rvdheij rvdheij 1024 Dec 31 14:52 posixuft

Permissions

Owner

Group

Page 23: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

Interpreting File PermissionsInterpreting File Permissions

-rwxrwxrwxOther permissions

Group permissions

Owner permissions

Directory flag (d=directory; l=link)

Page 24: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

Changing File PermissionsChanging File Permissions

Use the Use the chmodchmod command to change command to change file permissionsfile permissions The permissions are encoded as an The permissions are encoded as an

octal numberoctal number

chmod 755 file # Owner=rwx Group=r-x Other=r-xchmod 500 file2 # Owner=r-x Group=--- Other=---chmod 644 file3 # Owner=rw- Group=r-- Other=r--

chmod +x file # Add execute permission to file for allchmod o-r file # Remove read permission for otherschmod a+w file # Add write permission for everyone

Page 25: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

Some shell commandsSome shell commands

touch <option> <filename>: create a touch <option> <filename>: create a new filenew file

e.g.: touch directory/filenamee.g.: touch directory/filename rm <option> <filename>: remove filesrm <option> <filename>: remove files

e.g.: rm e.g.: rm ––fr directory/filenamefr directory/filename mv <old> <new>: change the name of mv <old> <new>: change the name of

a filea file ln ln ––s <src> <dest>: create a symbolic s <src> <dest>: create a symbolic

linklink

Page 26: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

File SystemFile System

Hierarchical arrangement of files and Hierarchical arrangement of files and directories.directories.

Top level: root or /Top level: root or /

e.g.: cd /e.g.: cd / . Current directory, .. One level higher . Current directory, .. One level higher

directorydirectory

e.g.: cd . e.g.: cd . No change for it is current No change for it is current directorydirectory

oror cd .. cd .. Change to parent directory.Change to parent directory.

Page 27: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

File SystemFile System

Pathname: absolute and relative.Pathname: absolute and relative. Absolute pathname: Absolute pathname:

/home/mruiz/text.txt (begins with /)/home/mruiz/text.txt (begins with /) Relative pathname: text.txt, Relative pathname: text.txt,

../mruiz/text.txt../mruiz/text.txt

Page 28: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

EditorsEditors

Different editors: emacs, nano, nedit, vi Different editors: emacs, nano, nedit, vi emacs <filename>emacs <filename> nano <filename>nano <filename> nedit <filename>nedit <filename> vi <filename>vi <filename>

Page 29: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

Other ways to view filesOther ways to view files

These can be very useful. Try them These can be very useful. Try them out:out:

more text.txtmore text.txt

less text.txtless text.txt

head text.txthead text.txt

tail text.txttail text.txt

Page 30: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

head/tailhead/tail

head: displays the first lineshead: displays the first lines(10 lines by default) (10 lines by default)

head Sequences.txt head Sequences.txt

head –30 Sequences.txthead –30 Sequences.txt tail : displays the last lines (10 tail : displays the last lines (10

lines by default) lines by default)

Page 31: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

cat, less and morecat, less and moreThe cat command reads one or more files and prints them to standard output.

The operator > can be used to combine multiple files into one. The operator >> can be used to append to an existing file.

The syntax for the cat command is:cat [options] [files]

cat file1cat file1 file2 > allcat file1 >> file2

Page 32: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

cat, less and morecat, less and more more more The The moremore command displays the file called command displays the file called namename in in

the screen. the screen. The RETURN key displays the next line of the file. The The RETURN key displays the next line of the file. The

spacebar displays the next screen of the file. q for spacebar displays the next screen of the file. q for quit.quit.

The syntax for the The syntax for the moremore command is: command is:more [options] [files]barre more [options] [files]barre

more Sequences.txtmore Sequences.txt to compare with to compare with cat cat Sequences.txtSequences.txt (CTRL C to stop) (CTRL C to stop)

Page 33: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

cat, less and morecat, less and more

less:less: program similar to more program similar to more but which but which

allows backward movement in the file allows backward movement in the file as well as forward movement. Also, as well as forward movement. Also, less does not have to read the entire less does not have to read the entire input file before starting, so with input file before starting, so with large input files it starts up faster large input files it starts up faster than text editors like vi.than text editors like vi.

Page 34: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

Redirecting standard input Redirecting standard input and standard outputand standard outputcommand1 > file1

executes command1, placing the output in a new file named file1.

command1 >> file1executes command1, placing the output in the

existing file named file1.command1 < file1

executes command1, using file1 as the source of inputcommand1 < infile > outfile

combines the two capabilities: command1 reads from infile and writes to outfile

Page 35: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

e.g.: ls -l | more e.g.: ls -l | more oror ls –l | less ls –l | less

Connecting commands Connecting commands with Pipeswith Pipes

The output of one command can The output of one command can become the input of another:become the input of another:

ps aux | grep netscape | wc -l

The output of the ps command is sent to grep

grep takes input and searches for “netscape” passing these lines to wc

wc takes this input and counts the lines its output going to the console

“|” is used to separate stages

Page 36: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

Program & ProcessProgram & Process

Program is an executable file that Program is an executable file that resides on the disk.resides on the disk.

Process is an executing instance of a Process is an executing instance of a program.program.

A Unix process is identified by a unique A Unix process is identified by a unique non-negative integer called the process non-negative integer called the process ID.ID.

Check process status using the Check process status using the ““psps”” command.command.

Page 37: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

Your pathYour path

To see your path, type To see your path, type echo $PATHecho $PATH

If you are bored with typing the full If you are bored with typing the full path to programs, you can put them in path to programs, you can put them in your path.your path.

Eg.Eg.

mkdir ~/bin/mkdir ~/bin/mv program ~/bin/mv program ~/bin/export PATH=$PATH:~/binexport PATH=$PATH:~/binprogramprogram

Page 38: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

Background processesBackground processes

A program run using the A program run using the ampersand operator ampersand operator ““&&”” creates a creates a background process.background process.

E.g.: program &E.g.: program &

CONTROL Z (^Z) : suspend processCONTROL Z (^Z) : suspend process bg : switches process to bg : switches process to

backgroundbackground fg : switches process to foregroundfg : switches process to foreground

Page 39: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

How to stop a process?How to stop a process?

Foreground processes can generally be stopped Foreground processes can generally be stopped by pressing CONTROL C (^C).by pressing CONTROL C (^C).

Background processes can be stopped using Background processes can be stopped using the kill command.the kill command.

Usage: kill SIGNAL <process id list>Usage: kill SIGNAL <process id list> kill -9 <process id list> (-9 means no blocked)kill -9 <process id list> (-9 means no blocked)

Or Or kill <process id list>.kill <process id list>. If a foreground process is not stopping by ^C, If a foreground process is not stopping by ^C,

you can open another session and use the kill you can open another session and use the kill command.command.

Page 40: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

UNIX summaryUNIX summary

Use a text terminal for powerful, Use a text terminal for powerful, remote computingremote computing

Use ls, cd, mv, cp, nano and friends Use ls, cd, mv, cp, nano and friends to deal with files and directoriesto deal with files and directories

You can use many tools quickly – but You can use many tools quickly – but generally the output is in text formatgenerally the output is in text format

Page 41: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

Bioinformatics commandsBioinformatics commands No bioinformatics programs come with UNIXNo bioinformatics programs come with UNIX

Most biology department servers have them Most biology department servers have them installed already. But you should probably installed already. But you should probably know how to do it yourselfknow how to do it yourself

It is pretty much the same as installing any It is pretty much the same as installing any other program on UNIX – except you need to other program on UNIX – except you need to keep in mind the requirements for disk keep in mind the requirements for disk space and memory.space and memory.

Page 42: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

Disk space and memoryDisk space and memory These are different things. These are different things.

Disk spaceDisk space is the amount of is the amount of free spacefree space for for datadata on your on your hard diskhard disk drive. drive.

MemoryMemory is the amount of is the amount of RAMRAM installed in the installed in the computer.computer.

Both of these are critical for many bioinformatics Both of these are critical for many bioinformatics applications. For example, BLAST databases can applications. For example, BLAST databases can be very large and take up a lot of disk space, and be very large and take up a lot of disk space, and in order to search through them, the BLAST in order to search through them, the BLAST program needs to load a lot of data into RAM.program needs to load a lot of data into RAM.

Page 43: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

Running programsRunning programs To run a program with a command, it needs to be either in your To run a program with a command, it needs to be either in your

PATH, or you specify the path to it.PATH, or you specify the path to it.

E.g. say I have the blastall E.g. say I have the blastall binarybinary in my home directory. I could in my home directory. I could run blastall with either of the following:run blastall with either of the following:

cd /home/mruizcd /home/mruiz./blastall./blastall

Or, from any directory,Or, from any directory,

/home/mruiz/blastall/home/mruiz/blastall

Or, I can install it and put it in my path:Or, I can install it and put it in my path:

mv blastall /home/mruiz/binmv blastall /home/mruiz/binexport PATH=$PATH:/home/mruiz/binexport PATH=$PATH:/home/mruiz/binblastall blastall

Page 44: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

blastall takes a lot of resources.blastall takes a lot of resources.

So that more important jobs take So that more important jobs take precedence (ie other people can precedence (ie other people can still read their terminals) you need still read their terminals) you need to use “nice”.to use “nice”.

nice –n 10 blastall –p etc.nice –n 10 blastall –p etc.

Being “nice”Being “nice”

Page 45: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

Big outputBig output How are we going to deal with the How are we going to deal with the

size of the output text?size of the output text?

nice –n 10 blastall -p blastp -i exampleprotein.txt -nice –n 10 blastall -p blastp -i exampleprotein.txt -d ArabidopsisP >myblastfile.txtd ArabidopsisP >myblastfile.txt

nice –n 10 (blastall blah blah) |morenice –n 10 (blastall blah blah) |more

Page 46: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

The backgroundThe background

If you are running a program that If you are running a program that takes a long time, especially if takes a long time, especially if redirecting output to a file, put it in redirecting output to a file, put it in the background, and you can keep the background, and you can keep working.working.

Page 47: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

Running overnightRunning overnight

If you are running a program If you are running a program overnight, use overnight, use nohup nohup before the before the program command program command – this way the – this way the command will keep command will keep running when running when you exit the shell.you exit the shell.

Don’t forget to use > to redirect Don’t forget to use > to redirect output to a file when doing this.output to a file when doing this.

Page 48: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

Viewing running processesViewing running processes

You can see all the processes on the You can see all the processes on the system, ranked by how much system, ranked by how much memory and CPU time they are memory and CPU time they are using.using.

toptop

Page 49: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

Getting information from Getting information from output filesoutput files

Often these are huge text filesOften these are huge text files

grep grep is a great tool for getting at the is a great tool for getting at the nitty-gritty.nitty-gritty.

awk awk is more powerful, but mostly is more powerful, but mostly involves writing scripts, and has been involves writing scripts, and has been largely superseded by Perl.largely superseded by Perl.

Page 50: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

grepgrep

““global regular expression and print”global regular expression and print”

Allows you to pick out lines of a text file that match a query, Allows you to pick out lines of a text file that match a query, count them, and retrieve lines around the match.count them, and retrieve lines around the match.

Usefull options :Usefull options :

-i-i

-c-c

-v -v

-A-A

Page 51: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

grep - continuedgrep - continued

grep ‘Query=’ myblast.txtgrep ‘Query=’ myblast.txtWhat sequences did I BLAST?What sequences did I BLAST?

grep –c ‘>’ testprotein.txtgrep –c ‘>’ testprotein.txtHow many sequences are in this file?How many sequences are in this file?

grep –A 10 grep –A 10 ‘‘>>’’ testprotein.txt testprotein.txtGive me the first ten lines of each Give me the first ten lines of each

proteinprotein

Page 52: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

egrepegrep

grep like command butgrep like command but accepts complete regular accepts complete regular

expressions (including ones with expressions (including ones with « + », « ? », « | », « ( ) »« + », « ? », « | », « ( ) »

-f : obtain PATTERN from FILE-f : obtain PATTERN from FILE

Page 53: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

awk / gawkawk / gawk Awk : suite d'action de la forme : motif { action }, le Awk : suite d'action de la forme : motif { action }, le

motif permettant de determiner sur quels motif permettant de determiner sur quels enregistrements est appliquée l'action enregistrements est appliquée l'action

Enregistrement = une chaine de caractères séparée par Enregistrement = une chaine de caractères séparée par un retour chariot, en général une ligne. un retour chariot, en général une ligne.

Champs = une chaine de caractères separée par un Champs = une chaine de caractères separée par un espace (ou par le caractère specifié par l'option -F) espace (ou par le caractère specifié par l'option -F)

Accès aux champs de l'enregistrement courant par la Accès aux champs de l'enregistrement courant par la variable $1, $2, ... $NF , $0 correspond à variable $1, $2, ... $NF , $0 correspond à l'enregistrement complet, NF au nombre de champs de l'enregistrement complet, NF au nombre de champs de l'enregistrement courant, $NF au dernier champ.l'enregistrement courant, $NF au dernier champ.

Page 54: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

awk / gawk : examplesawk / gawk : examples awk -F ":" '{ $2 = "" ; print $0 }' /etc/passwd imprime chaque awk -F ":" '{ $2 = "" ; print $0 }' /etc/passwd imprime chaque

ligne du fichier /etc/passwd après avoir effacé le deuxième ligne du fichier /etc/passwd après avoir effacé le deuxième champschamps

awk 'END {print NR}' fichier imprime le nombre total de lignes awk 'END {print NR}' fichier imprime le nombre total de lignes du fichiersdu fichiers

awk '{print $NF}' fichier imprime le dernier champs de chaque awk '{print $NF}' fichier imprime le dernier champs de chaque ligneligne

who | awk '{print $1,$5}' imprime le login et le temps de who | awk '{print $1,$5}' imprime le login et le temps de connexion.connexion.

awk 'length($0)>75 {print}' fichier imprime les lignes de plus awk 'length($0)>75 {print}' fichier imprime les lignes de plus de 75 caractères. (print équivaur à print $0) de 75 caractères. (print équivaur à print $0)

Page 55: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

Argument list too longArgument list too long

When using grep or other commands that When using grep or other commands that requires a listing or search through several requires a listing or search through several thousand files you may get the "Argument list thousand files you may get the "Argument list too long" or "/bin/grep: Argument list too too long" or "/bin/grep: Argument list too long." error. long." error.

Workaround : xargsWorkaround : xargs

find ~ -type f -print0 | xargs -0 grep find ~ -type f -print0 | xargs -0 grep "examplestring« "examplestring« 

finds all files in your home directory finds all files in your home directory each file each file that is found is then searched that is found is then searched using grep for the using grep for the text "examplestring".text "examplestring".

Page 56: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

Remote connectionRemote connection

telnettelnet ftp : file transfertsftp : file transferts ssh/scp/sftp : secure connectionsssh/scp/sftp : secure connections wgetwget

Page 57: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

Getting files from remote Getting files from remote serversservers

ftp ftp.ncbi.nih.govftp ftp.ncbi.nih.gov

ftpftp

Page 58: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

ftp commandsftp commands

openopenopen a connectionopen a connection lsls same as UNIXsame as UNIX cdcd same as UNIXsame as UNIX getget get me this fileget me this file mgetmget get more than one fileget more than one file putput put a file on the serverput a file on the server lcdlcd local cdlocal cd closeclose close connectionclose connection byebye exit the ftp program exit the ftp program

Page 59: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

Secure ftpSecure ftp Although NCBI allows you to connect Although NCBI allows you to connect

using ftp, this is because they have using ftp, this is because they have only public files, and they don’t let you only public files, and they don’t let you upload anything.upload anything.

Most UNIX computers disallow ftp Most UNIX computers disallow ftp logins. However, if you can ssh to a logins. However, if you can ssh to a computer, you can also use sftp. The computer, you can also use sftp. The commands are identical to ftp, but you commands are identical to ftp, but you can access your own files securely. can access your own files securely.

Page 60: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

scpscp

Copying files from/to remote Copying files from/to remote serversservers

scp src:/src_path dest:/dest_pathscp src:/src_path dest:/dest_path

Page 61: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

wgetwget

But what if you want to get a file But what if you want to get a file which is available for download which is available for download from a website, but not by ftp?from a website, but not by ftp?

wget will get the contents of any wget will get the contents of any URL and put them in a file.URL and put them in a file.

wget www.upm.edu.mywget www.upm.edu.my

Page 62: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

From your desktop : filezilla, From your desktop : filezilla, winscpwinscp

Page 63: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

From your desktop : filezilla, From your desktop : filezilla, winscpwinscp

Page 64: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

Compression toolsCompression tools

gzip / gunzip : .gz filesgzip / gunzip : .gz files compress / uncompress : .Z filescompress / uncompress : .Z files tar –cvf / tar –xvf : .tar filestar –cvf / tar –xvf : .tar files tar –cvzf / tar –xvzf : .tgz filestar –cvzf / tar –xvzf : .tgz files

Others : bzip, bzip2, zipOthers : bzip, bzip2, zip

Page 65: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

Downloading programsDownloading programs

““ready to run” programs are called ready to run” programs are called binariesbinaries in unix-speak.in unix-speak.

They are often “zipped” in a .tar.gz file.They are often “zipped” in a .tar.gz file. To unzip, use To unzip, use gunzip gunzip and and tar –xvftar –xvf To run, specify the path to the To run, specify the path to the

program. E.g., program. E.g., ./program./program or or /home/matt/bin/program/home/matt/bin/program

You can download programs for UNIX You can download programs for UNIX just as you would for a PCjust as you would for a PC

Page 66: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

Source CodeSource Code

Most bioinformatics software is Most bioinformatics software is free, and open source. That is, you free, and open source. That is, you can download the actual can download the actual instructions the programmer instructions the programmer wrote.wrote.

This is great, because it means you This is great, because it means you can install these programs on can install these programs on almost any machine.almost any machine.

Page 67: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

The root userThe root user

Most UNIX machines have an account called Most UNIX machines have an account called “root”“root”

root can see everything, change everything, root can see everything, change everything, delete everything, including other users delete everything, including other users workwork

Unless you buy your own machine, nobody Unless you buy your own machine, nobody sane will give you root accesssane will give you root access

You usually need root access to install You usually need root access to install programs in the default location. But you can programs in the default location. But you can put them in your home directory instead.put them in your home directory instead.

Page 68: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

UNIX summaryUNIX summary Use ls, cd, mv, cp, nedit and friends to deal Use ls, cd, mv, cp, nedit and friends to deal

with files and directorieswith files and directories

Install, or compile, any program you like. Most Install, or compile, any program you like. Most are free.are free.

Use blastall, etc on the command line for high Use blastall, etc on the command line for high throughput work. Transfer the output to a file throughput work. Transfer the output to a file for best results and run in the background. for best results and run in the background. Grep the output file to get pertinent Grep the output file to get pertinent information….information….

Page 69: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

diff - attempts to determine the minimal set of changes needed to convert a file specified by the first argument into the file specified by the second argument

Other useful commands

diff file1 file2diff file1 file2

Page 70: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

find / -name « ls »find / -name « ls »

find - Searches a given file hierarchy specified by path, finding files that match the criteria given by expression

find . –name « seq_a_moi.fasta »find . –name « seq_a_moi.fasta »

Other useful commands

Page 71: Introduction to UNIX Manuel Ruiz, Bioinformatics School, Campinas, Sao Paulo, Brazil, 21-26 november 2011

Other useful commands to explore

•sort : sort filessort : sort files•wc : count characters, words, lineswc : count characters, words, lines•splitsplit//csplitcsplit : : cut fields horizontallycut fields horizontally•cut : cut fields verticallycut : cut fields vertically•paste : paste : merge corresponding or merge corresponding or subsequent lines of filessubsequent lines of files•sed : sed : perform basic text transformations perform basic text transformations on an input streamon an input stream

•historyhistory