getting started with the bdsg login service · getting started with the bdsg login service...
TRANSCRIPT
1
last modified 05/07/16
Getting Started with the BDSG Login Service Introduction and Learning Objectives This document comes in two parts – a short introduction to a number of necessary concepts,
and a set of annotated practical exercises to work through.
This tutorial will introduce the concept of the Unix operating system and then some of the
commonly used inbuilt commands. Basic programs for editing files are shown, and then
some command-line syntax useful for (re)directing input and output to programs and other
file manipulations. A short glossary/summary of commands is given at the end of the
document. By the end of the practical, you should be comfortable moving around your
account, manipulating directories, files and running simple commands.
Requirements To work through the exercises in this practical, you will need login access to a machine
running linux – either a server or a linux workstation. Here we give instructions assuming
that you will have a user account and password on the bioinformatics server
codon.bioinformatics.ic.ac.uk, which you will access via a local machine (PC, Mac or linux
workstation) with connection software already installed – such as a College teaching
machine. You can work through the tutorial using any local machine (PC, Mac or linux
machine) that is connected to the college network.
If you are working from anywhere else e.g. from home, you will need to use a VPN
connection as the server will only accept connections from the ic.ac.uk domain for security
reasons. (see below)
Supplementary information on how to install the connection software you will need on your
local machine (if not already installed) and how to log in using different combinations (e.g.
from a MAC or a linux box to Codon) and how to configure the necessary connections are
all available from http://www.imperial.ac.uk/bioinformatics-data-science-group/support/help/
under ‘connecting to codon’). You will need your standard college username and password
to access these help pages. If you don’t already have an account with us (Bioinformatics
Support Service/ Bioinformatics Data Science Group), you will need to apply for one via our
web-form at
http://www.imperial.ac.uk/bioinformatics-data-science-group/support/apply-for-account
Help on setting up VPN on a private machine is available from the main ICT web-site at
http://www.imperial.ac.uk/admin-services/ict/self-service/connect-communicate/remote-
access/method/set-up-vpn/
2
What is UNIX? UNIX is a commonly used operating system. UNIX has evolved since it was first created in the
early 70’s. This has led to the development of a large number of programs which, while strictly
speaking, are not part of “UNIX”, but come bundled with the operating system so you would
expect to find them on all UNIX machines. It is this bundled software that makes UNIX so popular
and powerful. This does not mean that the software was written to be easy to use, but it does
allow us to perform very complicated tasks. Unfortunately as so many people have contributed,
the look and feel of UNIX may not always appear consistent or logical. UNIX comes in a variety
of different ‘flavours’, but one of the most commonly used ones today is ‘Linux’ and that is what
the servers you will use today are running. Linux is distributed by several groups and Red Hat,
Ubuntu, Debian and Fedora are all names of common distributions.
Shells When UNIX was first invented it was felt that someone should invent a shell to protect the users
from its raw edges. Shells listen for your commands and then convert these into real UNIX on
your behalf. The important point is that there are many different shells - the borne shell (sh), the
C-shell (csh), the korne-shell (ksh), the t-shell (tcsh) and the bash shell.
We use the bash shell as our standard login shell on the training servers, as it is the default user
shell on most linux distributions – but please note that there are others, and the syntax of doing
things MAY NOT BE THE SAME in different shell environments. We have set up the servers so
that all the bioinformatics software you will need later is set up automatically. This will not always
be the case on other systems, and you may have to type additional commands or add them to
setup scripts to tell programs where to find dependencies, variables etc. This is a more advanced
topic and is outside the scope of this practical.
X-windows and X11 emulators The X Window System is a network-transparent window system that can run on a wide range of
machines. This system allows you to log onto a remote machine, but instead of having to enter
orders to that machine on the command line, a windows environment is displayed on your
screen. This window can have pull down menus, buttons, etc. The current version is X Window
System, Version 11 (or X11).
To use any X11 program on the server you only need to load one extra piece of software, called
an X11 emulator, onto your local desktop computer. If you are working on a Linux machine, X11
itself will almost certainly be already installed as part of its operating system. With an X11
emulator, you can run the X11 program across the network on your PC or Mac.
An X11 program is generally already installed on College teaching PCs. If you want to use X11
on your own local machine, free software is available for both PCs and Macs. Full information on
the software required to use X11, and also to move files backwards and forwards between
Codon and your local computer is available as above from
http://www.imperial.ac.uk/bioinformatics-data-science-group/support/help/.
Logging into the Unix Server codon.bioinformatics.ic.ac.uk These instructions by default assume that you are in a College computer–teaching room or
have in front of you a PC attached to the College network that already has a copy of the
following connection software installed: Putty (SSH client), Filezilla (secure file transfer),
XMing or Exceed (X11 emulator) - and will use the Bioinformatics Support Service/
3
Bioinformatics Data Science Group’s server codon.bioinformatics.ic.ac.uk for the
remainder of the tutorial. You will need a username and password specific for this machine
– your standard College username/password WILL NOT work here.
If you are logging in from other machines, specific instructions for doing so are available
from our web site as listed on the previous page.
Log in to your PC using your standard college username and password. You will need to
use the Putty program to login to our server, where you will run the remainder of the
practical. To allow the server to display graphics on your screen, you will also need to use
X11 software on the PC. Your teaching machine has either Exceed or XMing installed for
this purpose.
You will first need to configure and save a session inside Putty:
Find and double click on the PuTTY icon. If it is not on the desktop, look in the Start Menu,
following All Programs:
You should see a screen that looks rather like this: (but the ‘Saved Sessions’ field may be
empty)
Type codon.bioinformatics.ic.ac.uk into the Host Name (or IP address) box, and ensure
that the Protocol is set to SSH (shown by the ring above). Then click on
SSH in the left-hand pane to open its additional options – select X11, as indicated by the
arrow. You will then see the following:
4
Make sure that the box next to Enable X11 forwarding is ticked, as shown. Then click
Session in the Category list (on the left). This will take you back to the original screen,
where you should save the settings you have made as follows:
Type codon.bioinformatics in the Saved Sessions box, and click on Save. You have now
made a shortcut to enable you to login to the server next time without having to do any
configuration.
TOP TIP (optional): the session you have created will generate a screen for you to work in
that has white text on a black background. If you prefer alternative colours, you can change
them inside the Putty configuration. Make sure that your ‘codon.bioinformatics’ saved
session is loaded by selecting it and clicking on ‘load’, then go to the menu on the left side of
the Putty screen and click on the “Window -> Colours option as below.
Select ‘Default Background’ and then ‘Modify’. You can now select a suitable background
colour. Now select a ‘Default Foreground’ colour, to produce text that is visible against the
background. When you are done, you can go back to the ‘Session’ menu at the top of the
left-hand menu and click on Save
Now we need to start an X11 emulator program. Here we will assume that your PC has
Exceed installed. Some machines may have XMing installed instead - Alternative notes for
using XMing are shown in a boxed section at the end of the Exceed notes.
5
Look for the Exceed icon on the desktop, which looks a bit like this:
If you can’t find it, search for Exceed under the ‘All programs’ windows menu, and launch by
clicking on the icon you find there. NOTE: you may not see any new program window
appear on your desktop. Now you have X11 running, you can connect to the server, by
going back to Putty, clicking on your codon.bioinformatics saved session to select it, and
clicking on the ‘Load’ button followed by ‘Open’.
You will see a new window appear, that will look something like this (colour may differ
depending on your Putty configuration):
You will need the codon.bioinformatics username and password we have sent to you
earlier by email. You cannot log into this server using your standard college username and
password. Type in your username and password, pressing return each time (you won’t see
any characters on the screen when you type the password).
The first information to appear on the screen once you have logged in, is the location and
date/time that you last logged into the server, followed by a banner telling you which
machine you are connected to and a help email address ([email protected]). After this, is a
section where the administrator of the server can add any new messages about the service
– for instance warning of scheduled maintenance sessions (not shown in the example
above). This is called the Message of the Day (MOTD for short). On codon, you will then see
some horizontal bars that show a summary of how much space your account is using (more
on this in a later section).
The Prompt
The line on the screen that appears after you have typed in your password and pressed
return which has the form of
[sarahb@codon ~]$ all of this together is called the prompt
6
This reminds you of your username (e.g. sarahb), the short name of the machine you are logged in to (codon), and the directory you are currently in (~ - here your home directory – more about this later). The prompt reminds you that the machine is waiting for you to give a command. We can open as many terminal windows at once as are wanted or needed (there is a limit but you won’t ever need that many). You can start another terminal, by going back to Putty and starting another codon.bioinformatics) session, the same as before. Terminal windows are maximised and minimised, and moved around the screen the same way as for normal PC or Mac windows. Size can also be adjusted by dragging on a corner while using the left mouse button. Please do not close them by clicking on the X in the top right hand corner – this is not a safe way to log out.
N.B. When you have finished and are ready to log out, you can close a terminal window, by typing exit at the prompt, or <Ctrl> d (hold down the control key and type d)
Now we can check that X11 forwarding is working by typing the command:
xeyes
After a short pause, you should see a pair of googly eyes appear somewhere on your
screen:
If you can see them, go back to your putty session and stop
the xeyes program by typing <Ctrl> c (i.e. hold the control key down, while typing the
letter c)
If you see an error message and no eyes appear, please go back to your Putty configuration
and check that you have the Enable X11 forwarding box ticked – save any changes,
restart Putty and try Xeyes again.
Using XMing For X11 on a PC (instead of Exceed) Look on your desktop, quick launch bar or under the programs menu, for the Xming Icon, which looks a bit like this
Start the program by selecting it from the menu or double-clicking the shortcut. You won’t see very much happening at this point - XMing will add an icon to the notification panel at the bottlom right of your screen. Really you should start the X11 programs BEFORE starting your Putty connection.
7
Terminals – copying and pasting When X11 was originally designed, the assumption was that everyone would have a three-
button mouse, using the left mouse button to highlight and copy, and the middle button to
paste – so, what do you do if your mouse has less buttons?
Many mice only have two buttons, or perhaps 2 buttons and a central scroll wheel. To
emulate the third button (needed for pasting inside a terminal window), there are 2
possibilities, depending on how the mouse has been configured. Where there is a scroll-
wheel, pressing down on the scroll wheel (i.e. into the body of the mouse, rather than turning
the wheel) will paste, or if there are only 2 buttons and no scroll wheel, pressing the 2
buttons simultaneously will paste. On a Mac, you may only find one button on the mouse, or
two. There, holding down the Apple command key on the keyboard and c while
highlighting text should copy, while using the command key with v should allow you to
paste. Remember, whatever you are selecting to copy and paste will get pasted where your
cursor is. To select something for copying, press the left button and select the text, then
paste using whatever is designated as the middle mouse button – as above.
Exercise:
Go back to Putty and open another terminal window. Practice selecting some text in one
window and pasting it into the other terminal. You can also copy and paste between the
terminal and other programs on your PC e.g. Notepad. When you are happy, you can shut
the extra terminal window by typing exit
Basic commands It is possible to achieve a great deal with only a basic set of Unix commands. The server can be
thought of as a very large filing cabinet, containing files within file folders, within file folders, etc.
Folders are commonly referred to as Directories and Subdirectories within Unix. Carrying on
with the filing cabinet simile, imagine how hard it would be to find anything if you just threw all
your documents in the drawer without any folders, or dividers – chaos! The same thing will
happen to your Unix account if you choose to keep all your documents in your home directory,
instead of creating subdirectories (file folders) to store associated data together.
Your home directory is the directory you automatically start off in every time you log in to your
account.
There is a short-cut name for your home directory when you are typing – which is the ~ (tilde)
symbol.
First we will make a directory called course in which to store the files you will generate today,
type:
mkdir course
To move into this new directory, type: (note that the prompt changes to show the new directory)
jbloggs@codon ~]$ cd course
8
jbloggs@codon course]$
cd stands for change directory
course is a subdirectory of jbloggs, this person’s home directory. To show the fully qualified
pathname for your current directory type:
pwd
/home/jbloggs/course (typical reply)
pwd stands for ‘print working directory’ and will return the full path to where you are on the
machine relative to a fixed point – the Root of the machine. Here, this tells you that course is a
directory, within the directory jbloggs, which is a directory within home - we are using a
‘hierarchical file system’ which means that we can have directories within directories.
N.B.
Knowing the full path for a particular file is important when you need to tell the machine where to
find files you want to work on, which may reside in directories other than the one you are
currently working in. There are 3 ways of specifying the location of a file or directory:
1. Absolute address from the root of the machine (like the one shown above)
2. Relative to your home directory
3. Relative to the current directory you are working in at the time
Use the one that is easiest for you at the time. A location relative to the root of the machine
always starts with a /
When you type pwd, you will see an absolute path from the root of the machine. Other
forward slashes are added to delineate between directories. Relative paths do not start with
a /. There are various shortcut symbols to help you move around as well. We will explore
paths shortly in the exercises, but here is another example:
Examples of absolute addresses of the files:
/usr/users/fred
/software/eric
/data/rnaseq/reference/Homo_sapiens/Ensembl/hg19/Annotation/foo.gtf
As an example, let us assume we are currently in /usr and we want to move the file fred
into the software directory, we would type:
mv users/fred /software
or, alternatively we could type mv users/fred ../software
i.e. the symbol “..” stands for backwards one directory towards the root of the machine
a single dot “.” stands for the directory that you are in at the time – i.e. your current directory.
9
the tilde symbol “~” stands for your home directory
Now you can try this out in the exercise below: Type the following commands.
pwd (this will return your current directory, in this case course)
cd .. (this will move you one directory backwards to your home directory
cd /usr/biosoft (this will change your directory to one called /usr/biosoft
ls (this will list the contents of this directory)
cd ~ (this returns you to your home directory)
A note on filenames Unix makes use of many of the character keys on your keyboard. Some of them have
special attributes which means that they cannot be used in standard filenames – as they are
interpreted to mean something specific. There are ways of wrapping them so that they are
not interpreted by the operating system (e.g. by using an escape character first such as “\“ in
front of a space in a file name) but it is generally a GOOD IDEA TO AVOID using the
following characters in your file and directory names `¬!$%&*():;~#?/><,|\{ } [ ] / to stop
unexpected effects.
Spaces are also not expected in a file name, and if present, any characters after the space
will be ignored, e.g. a file called my filecalledfred will actually be seen as “my”
whereas my\ filecalledfred will be correctly seen.
Hyphens, underscores and full stops in file and directory names are fine.
NOTE – a full stop used at the first character of a filename or directory will create what is
known as a hidden file (one that is not seen when you list the contents of a directory). These
are generally used to tidy away configuration files that affect the way your account works.
Now we can copy some files into the course directory that you made earlier. These files are
currently sitting in a directory called intro_course type:
cd course
cp /home/biotrain/intro_course/* .
cp (short for copy) requires the name of the file or directory to copy and then the place to
put the copy.
NOTE the full stop, which comes after a space - and yes you do need to type it as it
specifies the place to put the copies! Here, the full stop is short-hand for ‘the directory I am
currently in”.
This copies all files (*) in the directory intro_course, which is a sub-directory of the home
directory, to the current directory (.) but not the directory intro_course itself. The * is known
as a wildcard (more about this later)
TIP – to copy a directory and all of its contents (including other subdirectories and their
contents, if present, we have to copy recursively. e.g.
cp -R /home/biotrain/intro_course .
10
this would copy the directory intro_course AND all of its contents to your current
directory.
As with most UNIX commands, if this command has worked, there will be no output to tell
you so. If anything is printed (except the usual prompt) this command has not worked, go
back and check you have typed it in EXACTLY as above. If you receive one, the error
message may be informative - for instance
cp: cannot stat fred: No such file or directory
(this suggests that the copy command cannot find the file you are trying to copy – in this case fred)
To list the files now present in your current (working) directory, type:
ls (if this is empty, your copy command hasn’t worked - try again)
To list all the files in your home directory, (the one with the same name as your
username) type:
ls ~ (the tilde or ~ symbol is an abbreviated name for your home directory)
Command line arguments There are two ways that a program can be given additional information - either
1) It can ask you questions on the commandline (prompt) – that you type answers to
2) You can offer the information without being prompted
The drawback of the program asking a question is that if it can do 20 different things, then
being asked 20 questions each time you run it can be very tedious. By convention most
UNIX programs don’t ask for information, they expect you to supply it. This is achieved by
using “command line arguments” sometimes also called flags. By convention, something
is indicated as an “argument” by placing a dash in front of it. Some bioinformatics programs
will ask a basic range of questions but expect additional information to be given via the
command line. We will look more closely at command line arguments, using ls as an
example case.
try typing ls –l
To list all of your files using a different combination of command line arguments, that
influence the output, try typing:
ls –Rl ~ (This is , R and then a small L, not the number one)
The –R flag causes ls to search recursively through all directories below, in this case, your home
directory which is indicated by using the ~ symbol.
11
The –l flag causes a long listing of the information including sizes, ownership and creation/last
modification times.
On this machine, files and directories listed by ls are shown coloured by their type:
Blue: Directory
Green: Executable or recognized data file
Sky Blue: Linked file
Pink: Graphic image file
Red: Archive file
This can make things a little hard to read sometimes. We have set an alias on the ls
command so that when it is run, it automatically and silently adds the option to show
colours. To see this alias type
alias ls and you will see the following:
alias ls='ls --color=tty' (in other words, if someone types ls, you actually run ls
with the optional flag “–colour=tty” to colour output by type if run in a terminal).
Note: You can turn this colour-coding off in the terminal window (for this session only) by
typing unalias ls
Now try to sort all your files according to their age (newest last)
ls –lrt
Finally, we can take a look at some files you don’t normally see when you list with ls
ls –la ~
this makes visible so-called hidden files and directories whose names start with a full stop, .e.g. drwx------ 8 train17 training 4096 May 17 13:49 .
drwxr-xr-x 23 root root 4096 May 5 13:43 ..
-rw------- 1 train17 training 17208 May 16 15:33 .bash_history
-rw-r--r-- 1 train17 training 18 Nov 20 05:02 .bash_logout
-rw-r--r-- 1 train17 training 193 Nov 20 05:02 .bash_profile
-rw-r--r-- 1 train17 training 231 Nov 20 05:02 .bashrc
drwx------ 3 train17 training 20 May 12 12:10 .cache
Here the top line is returning information about your current directory (.) and the second line, the directory one further back towards the root of the machine. If you were to type the command inside /homes/train99 for instance, the top line would refer to train99 and the second line to homes. NOTE – hidden or dot files (e.g. .bashrc) are generally doing useful work inside your account, influencing your environment. DO NOT DELETE THEM. If you delete them by mistake, your account may not look the same next time you log in, or certain programs may no longer work as expected. If so – contact [email protected] for help.
Ownership and Permissions Example of file information returned by the command ls -l:
drwx------ 20 sarahb system 8192 Sep 12 2002 www_data/
12
-rw-r--r-- 1 johnp system 1200 Sep 24 17:30 tape.txt
The first character of each line (as below) indicates the type of the file. For example, a d in this position indicates a directory, - indicates a regular data file.
-rw-r--r-- 1 sarah system 1200 Sep 24 17:30 tape_change.txt
^^^
The next three characters define the permissions afforded to the owner of the file. In this case,
they should be set to rw- for all the listed files. This indicates that the file owner can read, write
to, but not execute the files.
Write permission is required in order to edit or delete a file. Execute permission is required if the
file is a program file or a file containing a list of textual UNIX commands (a script). Without
execute permission, a program or script file cannot be made to run, i.e. be executed. Execute
permission is also required for directories in order to gain full access to the files stored within.
-rw-r----- 1 sarah system 1200 Sep 24 17:30 tape_change.txt
^^^
It is possible to divide the users of a system into groups. This allows users to set their file
permissions such that members of their group have greater access to their files than do other
users of the system. The next three characters define the permissions that the members of the
user’s group have. The group name is given in the fourth column (in this case, “system”).
These three characters should be set r--, indicating that members of your user group may read
your files but may not write to (i.e. amend or delete) or execute them.
The next three positions refer to the access that everyone else (world) would have (in this case none – as shown by a dash). -rw-r----- 1 sarah system 1200 Sep 24 17:30 tape_change.txt
^^ ^
The second column reports the number of links to the file (you can ignore this figure). -rw-r----- 1 sarah system 1200 Sep 24 17:30 tape_change.txt
^
The next columns report the owner of the file (sarah) and the group (system) to which the file
belongs. The figures following this are the size of the file in bytes (characters if you prefer), the
date and time that the file was last modified, and finally the name of the file (or directory).
Changing file permissions There may be a situation where you want someone else to be able to copy or read one of your
files. You will have to change the permissions on the files to allow them to do so. You must also
change the permissions of the parent directories, as these override those of individual files. It is a
very common error to forget to do this. The command to change permissions is chmod. You
have to specify who you are modifying permissions for, and what permissions you are changing,
and for what file/directory.
N.B. Unless you have a specific need to share a specific file-set, you should not normally
need to modify the permissions in your account.
u means user and refers to the owner of the file g means group, and refers to the group the file belongs to o means others, everyone apart from those above a means all, i.e. user, group and others Also, as we have seen above, r means read permission, w means write permission and x means execute permission.
13
So, for example, to give read permission to someone in the same group for a file called “filename” in ~/course. ls –l ~
chmod g+r ~ (allow people in the group to read my home directory)
chmod g+r ~/course (allow people in the group to read the directory course)
chmod g+r ~/course/filename (allow the group to read the file called filename)
chmod a+r ~ (allow everyone to read your home directory)
If you wanted to remove the permissions use – instead of + chmod g-r ~ stop your group from reading your home directory
Looking at text –based files There are a number of commands you can use to look at files that contain text. Sometimes
you may want to just send the contents to the screen all in one stream without stopping but
more generally you may want to be able to look at the content a screen-full at a time.
Two of the most useful are: more filename
less filename
These two commands are very similar, but less has greater flexibility – (less does more than
more does – silly pun). Both will present information from the file to the screen one page at a
time, (as opposed to other commands that scroll down the document too quickly to read
such as cat). However, less will allow you to scroll back up the document using the arrow
keys, whereas more only allows you to scroll down. To exit out of a document you are
reading using more or less, type
q
Note: more is a standard UNIX command, whereas less may or may not be available on
other UNIX systems you may encounter.
For example, try the following: cat cd4_human.pep
more cd4_human.pep
The more command shows you the contents of a file one page at a time and tells you to hit
the space bar to continue. Now try:
less cd4_human.pep
You should be able to use the arrow keys to scroll up and down the document.
When using less there are a number of keystrokes you can use to give for navigation
h help
14
q quit program
space bar next page
return key next line
f forward one page (same as pressing the space bar)
b back one page
G go to the end of the file
g go to the start of the file
j moves you forward a line
k moves you back one line
/xxx search for the characters xxx in the file8.and highlight matches
n find next occurrence of search pattern above
? search in the opposite direction
Now Try looking at one of your files using less, and navigate around the document
using some of the commands shown above.
Looking at Other Files
If files have been compressed using the gzip command, they will usually have a filename
which ends in .gz. If they are text-based files, you will be able to read the contents without
uncompressing it using a command zcat. This sends the output to screen all in one go, so
you might want to redirect it into the less command so you can read it one screen-full at a
time
zcat myfile.gz | less (more on redirection later…)
If you really need to look inside a binary file, you will either need to use a program designed
to work specifically with its exact format (and this will depend on which program created it) –
or you can extract readable strings out of it using the strings command)
Wild cards The * character is a ‘wildcard’. That means it can mean any symbol or symbols.
Thus:
*.seq means all files ending in .seq
c*.pep all files whose names begins with c and ends with .pep
* all files
Wild cards allow us to specify alternative filenames with a minimum number of keystrokes.
We can also use the ? character, meaning “any single character”, so
more cd4_?????.pep
will display any file beginning in cd4_, followed by any 5 characters, and then .pep. So,
here this would match cd4_mouse.pep and cd4_human.pep files, but not cd4_rat.pep.
We can also use square brackets to denote a range of letters [a-z] or a selection of
letters [abrh], so
15
more cd4_[abrh]????.pep
will match cd4_human.pep and cd4_rabit.pep, but not cd4_mouse.pep or cd4_rat.pep.
Now you try these commands on your files in your current directory, also
more *.pep
Copying and deleting files and directories Here, we will carry out a number of basic file manipulations using UNIX commands. A
summary of the commands we use and what they do is provided at the end of these notes.
We are going to:
• make a new subdirectory under the one we are in at the moment
• move some files into it
• rename a few files
• delete the directory we have made (and its contents)
These are all functions that you will need in order to be able to organise and navigate within
your own account.
Type:
cd ~/course moves you into the directory course, under your home directory
mkdir test make a new directory called test
ls test list all the files in the directory called test (directory should be empty)
Now, we are going to copy a file into that directory.
cp cd4_human.pep test copy the file cd4_human.pep into the directory test
cp stands for copy, and is an important command. An important point to note is that you can
copy files, or directories, (if you add certain flags). Notice that above, we are copying the file
cd4_human.pep to the directory test. If we had not previously created the directory called
test, the computer would have assumed that what we wanted to do was to copy the file
“cd4_human.pep” and call the copy “test”. If you wanted to be sure, you could write the
following, but it does the same as the command above:
cp cd4_human.pep test/cd4_human.pep
Remember, to the computer, files and directories are two different things. A directory is
something you can store other things in. But you do have to TELL the computer if you intend
something to be a directory or just a file. That is why you have special commands, like
mkdir, to make a directory.
Now, try the following:
cd test move into the directory “test”
mv cd4_human.pep newname.pep rename the file cd4_human.pep to newname.pep
cp newname.pep second.pep make a copy of newname.pep called second.pep
16
mv is short for move, and is the command used for either moving files to new locations, or
purely renaming them (a similar act!).
Now we want to delete, or remove, newname.pep. Type:
rm newname.pep
Now, let’s move up a directory, to course, and then delete the test directory completely:
cd ..
rm -r test
The .. is a shortcut, meaning ‘go back up one directory from where you currently are’ – in
this case back from test to course.
The flag -r is required to delete directories and will delete a directory recursively along with
all of its contents – including other subdirectories so BE CAREFUL.
On this machine, you will be prompted to examine contents of a non-empty directory and
asked if you want to delete each subdirectory and contents individually (type Y or N when
prompted). To blindly delete without examining, again insert a backslash in front of the rm to
unalias it.
.
Empty directories are more usually deleted using the rmdir command.
Note the difference between the mv and cp commands. If the entity you are moving or
copying is a directory, the source file(s) are moved (mv) or copied (cp) into that directory. If
the name given as the place you are moving or copying to is not already known to the
computer as a directory, then the file is copied (cp) or renamed (mv).
If the target (destination) is a file which already exists, then the program will ask you to first
confirm the action (this is not standard, most UNIX systems will immediately overwrite the
original file).
ls (returns 2 files that exist)
normal_1_1_fastqc.html
normal_1_2_fastqc.html
cp normal_1_1_fastqc.html normal_1_2_fastqc.html
cp: overwrite normal_1_2_fastqc.html? n (user is prompted if existing file
should be over-written - file is not overwritten as answer no is given)
If you add a -f flag to the copy command, it forces the action to be done silently, but take
care if you choose to do this!
The file system has been set up to try to stop you copying over files and directories
accidentally. However, NOT ALL PROGRAMS ARE AS NICE. Although the more
dangerous commands (cp, rm, mv) have been modified so that they will at least ask first,
most bioinformatics programs won’t. So if you repeat an analysis, the results of the
second analysis may overwrite those of the first unless you give the program a new
destination name for the output. Other programs may just silently fail to run.
17
The moral here is to make copies of important files before you start manipulating them:
cp file file.orig
Using an editor to create a file There are several text editors available to you on our servers. The most universal UNIX
editor is vi (or its slightly more helpful version vim) but it isn’t the simplest editor to use.
Today we will be looking at two editors, a simple editor called pico, and a windows-based
editor called gedit.
Pico [Pico is available on Codon but not currently on training.medbio].
We will create a simple text file. We will make a file of filenames called cd4.list. So, we
start up the editor pico, telling it to edit or create the file cd4.list. If we gave it no filename
pico would start a new file and ask you for a name to save it under when you exit the
program.
pico –w cd4.list (the –w flag tells pico not to linewrap long lines)
.
At the bottom of the screen you will see the standard pico commands in reverse video like
this: (I have started to type test into the editor pane)
e.g. cntrl-x exit cntrl-u undo cntrl-w search.
You need to use the arrow keys to move around inside your document. Now Type the
following lines into the file:
cd4_cerae.pep
cd4_erypa.pep
cd4_human.pep
When you have inserted the three lines quit from the editor by pressing <Cntrl> x
The list you created consists of the names of three files in your current directory. Some programs
can take as input a list file like this (i.e. a file containing the names of other files to input). If you
18
wanted to use files in any other directory, you will need to tell the machine where to look for
them, either relative to the place where you are when using the listfile, or the absolute path from
the root of the machine. To do this, you need to specify the full path of your files.
e.g: /home/jbloggs/course/cd4_cerae.pep
Remember - The pwd command can be very useful is you are not sure what the full path to your
file is!
Now let’s look at the gedit editor (the Gnu editor - where Gnu is a free software foundation
rather than an ungulate).
Gedit requires an X11 connection, by the way, while pico does not, and will work in a simple
terminal – one reason to be familiar with both.
Start it by typing
gedit cd4.list &
The & symbol runs the program in the background so you can carry on working in the
terminal as well, if you wish – more about this later
This is a slightly more friendly-looking editor. Here we have several menus, selected by using the
right mouse button. At the bottom of the window, a menu currently showing a Plain Text option,
allows you to select auto-syntax prompting, for a large number of possible programming
languages, including Python, C++.
now try editing the text file you have loaded. When you have seen enough, save the file and
then exit gedit.
Finding Help There are a number of places you can go to find programs you need, or find out about
programs or commands. Here are a few options:
19
The man command
man more
The UNIX command for getting help is man (because it brings up manual pages). These
pages provide information on a number of programs on the system, including many of the
UNIX commands you may have cause to need. If you type:
man ls
you can now read all about the ls command, including what extra information you can give
the program to get it to do particular things. Of course, you need to already know the name
of the program to get help this way. If you don't know this however, you can type either of
the following to try and find out what commands exist for what you want to do:
man -k keyword
or
apropos keyword
You can now look at the man page for any command you think is appropriate.
Other help A few bioinformatics programs (e.g. Hmmer) have man pages, but most don’t. Often help
files are distributed in html (web) format or as pdf files and can be found by searching our
web site by software name or by looking in our software database
(http://www.imperial.ac.uk./bioinformatics-data-science-group/resources/software).
If you experience problems using your account, including forgotten passwords, running out
of space, can’t find what you want, programs behaving unexpectedly, please contact out
email help-desk by mailing [email protected]
Passwords If you are using this Unix account for more than the duration of this practical session (i.e. a
one day temporary account), you should change your password from the temporary one you
were assigned. Generally, temporary new passwords are disabled after a week, and if you
do not log in and change it before then, you may find yourself locked out until you contact
[email protected] for a new one.
This is an important security step as we have had to distribute your default usernames and
passwords by email. New passwords must be a minimum of 8 characters, contain at least
one number, capital and/or extended character (not spaces, exclamation marks, full stops,
brackets or slashes), and contain no obvious words. To change your password, type the
command passwd as below and then you will be prompted for your current password and
then to type in a new one twice (which will not show up on the screen):
passwd
[trainer@training rnaseq]$ passwd
Changing password for user trainer.
20
Changing password for trainer.
(current) UNIX password: (type the old password and press return)
New password: (type the new password and press return)
New password: (type the new password and press return)
password for user trainer changed
If a password selected is not suitable or if there are differences between the first and second version
of the new password, you will be warned and the password will not be changed.
If you forget what you have changed the password to – you will need to email [email protected]
with your name, name of the machine you are trying to log in to (e.g. training.medbio.ic.ac.uk)
and current username, and we can send a new temporary one for you.
Quotas
On most of our servers, we use a system of quotas to control the amount of file storage any
particular user can use. As well as encouraging you to consider keeping your account tidy by
periodically removing unwanted temporary files, this can help to control the output of
runaway processes. Generally, quotas operate a bit like a bank account with an account limit
(soft quota limit) and an overdraft facility (hard quota limit). Quotas are usually set on
actual space (in kilobytes) and inodes (numbers of files). Unless you are storing huge
numbers of very tiny files, you are unlikely to ever hit the inode quota.
On some of servers – Codon, for instance, you will see information on your current space
and quota when you first login, something like this:
The histogram indicating the proportion of space used will display orange markers when you
are using more than 80% of your space, and red when over 90%.
Note – normal accounts will show only one bar here, when given access to a larger project
space which is quota-ed separately (e.g. data/syntegron in this example), each project will
21
show as a bar. Please note that the training server training.medbio – may not show this
histogram.
To see your account quota and how much space you are using you can use the quota
command.
quota –s
(the -s argument makes the quota command return appropriate units for allocated space –
here megabytes MB and Gigabytes GB – the default shows only blocks units).
[sarahb@codon ~]$ quota -s
Disk quotas for user sarahb (uid 1003):
Filesystem blocks quota limit grace files quota limit grace
192.168.0.45:/mnt/home
5291M 10240M 11264M 4957 0 0
Filesystem - the file system to which the quota is applied (in this case, the file
system containing the home directories).
blocks - how much disk space you are using (in this case, 70110 Mb)
quota - the actual amount of space you have been given to use (97,280 Mb)
limit - this is the absolute hard limit of space you can use. There is a small overdraft
allowance of space between your quota and limit - but you cannot exceed your hard
limit – files will no longer be created and your account may act strangely for this
reason.
grace - when you have filled up your allocated quota, the system automatically gives
you a period of time (7 days) during which you can use your overdraft space up to
the hard Limit. If you have not had a tidy up or contacted us within this time you will
not be able to create new files or edit files until you have removed something to free
up some space.
Files - the quota on the number of actual files you are allowed to have. At present we
are not applying quotas to the number of files you can store – hence the zeros.
Home Directories and where to work
Every account has a 'home' directory associated with it, which is your personal space to
store your data, and is backed up daily. By default, our home directories have a quota of
10Gb. This can be extended up to 100Gb on request. Project directories are created under
/data and can be made available to a group of users or an individual where larger space is
required, or contents need to be shared across a specific group. (there may be charges
associated with this additional space.
If you think you have DELETED a file or mis-edited it, from inside your account, (including
project directories) and really need the old version – and the file was created more than 24
hours ago – please send an email to [email protected] giving the exact name and previous
location of the file or files. We may be able to restore a previous version from backups (this
is not generally possible from temporary training accounts).
22
Scratch Directories - Codon has a sizeable unquota-ed scratch volume available for use
as temporary workspace. The term scratch is generally used to denote a working space
which is used for temporary storage of data that is not backed up. By using this space in
your day-to-day work when you know you will be creating very large interim results files, you
do not need to worry about exceeding your home directory quota. BUT this space is not
backed up and old data is subject to automatic removal after fixed periods of non-use.
Consequently it is extremely important to ensure you move data requiring long-term storage
to either your home directory or project storage. NOTE – you will not need scratch or project
space for any training courses and training accounts are not set up with these.
Please contact [email protected] if you feel that you need to use the scratch volume,
and a directory will be created for you within it.
Getting Files on and off the Server For security reasons, the standard ftp protocol is disabled on our servers. To transfer files on
and off the servers you will need to use a more secure protocol such as sftp or scp.
For transferring files between a PC or Mac and our servers we recommend the FileZilla
secure ftp client – which is free, and installed as standard on College Desktop machines.
On a windows machine, you may also find Winscp useful.
Information on using FileZila with a Mac is available at
https://wiki.imperial.ac.uk/display/BioInfoSupport/Installing+FileZilla+on+OSX
Now we will use Filezilla to transfer a file from the server to our local PC. First you will need
to launch FileZilla on your local PC. Find the FileZilla icon on the desktop and
double click to start it. If you cannot find one, search for the program by name in the All
Programs Search box. Once Filezilla launches it will look something like this:
23
but if you have not used it before, the right hand (server) pane will be empty. First you will
need to add the details of the server you want to connect to
type the server fullname (e.g. training.medbio.ic.ac.uk) into the Host dialogue box, your
username for that server in the Username box, the related Password in the Password box,
type 22 into the Port box (this is the default port number on the server for listening for SFTP
requests) and the press the Quickconnect button.
Some information will appear in the top section, telling you that you are being connected and
you may see a new window telling you that authentication keys are being added to the
server – if prompted, say yes. Once connected, the right hand Remote pane will be
populated with the directory tree starting at your home directory. Now you can browse to the
file you want to transfer (you can select more than one using the Shift or Ctrl keys) , click to
select it and simply drag it across from the Remote (server) pane to the left-hand Local (PC)
pane, into the appropriate folder. If you want to view a part of the server outside of your
account (e.g. /data) you can simply type the full directory address in the Remote site
dialogue box as below.
Now go to /data/rnaseq/fastqfiles on the server and copy the file normal_1_1.fastq to your
C:\temp or tmp folder. You can take a look at the contents of the file if you wish, using
Notepad or Wordpad. Once you have finished with FileZilla, shut the server connection by
using the Server menu and Disconnect option and then shutting the program in the usual
way.
24
Please note that Microsoft Office format files (e.g. Word .docx, Excel .xls) are not readable
on our Unix servers. If you need to read contents from these formats, you can save as .txt or
.csv respectively before transfer, but the default end-of-line characters (that you can’t
normally see) are different between windows and linux and may cause problems with some
programs.
More Useful UNIX Features Unix supports a “standard input” and “standard output” model. Normally, information is
accepted from the keyboard, (also known as “standard input”) and programs send
information back to the screen (also known as “standard output”). However, this input/output
can be redirected e.g. to a file. Some programs send their output to a file as standard. Errors
produced by a command are usually sent to the screen as well – although they come via a
different stream - generally called “standard error”.
Standard Output redirection
As an example we will consider the cat (concatenate files) command. This normally sends
the contents of a named file to the screen, all in one go. You can, however, redirect the
output into a file using the > symbol and the name of the file you wish the information to be
sent to.
cat unknown.tfa sends the information in file1 to the screen
cat unknown.tfa > another.file sends information in file1into a file called
another.file
cat unknown.tfa unknown2.tfa > another2.file sends contents of file1 and
file2 to another2.file , concatenating their contents one after the other.
Note: redirecting in this way will not overwrite an output file of the same name if it already
exists. You will need to remove it first, or force an overwrite by adding an exclamation mark
! immediately after the >
Standard Error redirection
Sometimes it is useful to trap errors that a program produces while it is running -for instance
if you are running a program from within a script, or if it is producing too much information on
the screen to easily read.
First we need to generate an error – we can produce one using grep (this program requires
a pattern to look for, and a file/files to find the pattern in)
[sarahb@codon course]$ grep
Usage: grep [OPTION]... PATTERN [FILE]...
Try `grep --help' for more information.
Now redirect the error and save it in a file as follows:
25
grep 2> error.out
less error.out
what did you see on the screen, and where did the error go?
Input redirection We can also redirect input. This can be useful for programs that normally require extra
information to run. If you type this extra information, exactly as required by the program, into
a file, you can then use this file to input information into the program.
For example:
blast < standard_blast_answers sends the information standard_blast_answers to the
blast program
To be really fancy we can even redirect the input and the output at the same time.
blast < standard_blast_answers >output
(If you think you know all the standard information that blast needs to run you can try this….
But we haven’t made a file for you in this instance).
Piping and other useful manipulations If we wish to carry out a series of actions on the same information, we can pipe the output of
the first action to the second, and if desired, pipe the output of the second action to a third,
etc. In other words, you can use the standard output of one program directly as the standard
input of another. This is more easily understood with an example:
When you list all the files in a directory but there are too many to fit on the page, you may
want to use the command more to allow you to view them page by page. To do this you can
pipe the results of the command ls –l through the command more. (The pipe symbol is ‘|’.)
ls –l ~ | more (list all the files in my home directory and view them page by page)
ls -l ~/course | more (list files in the directory course and view them page by page)
An Example: Let us say that you want to identify all files created in January. One way of
doing this would be to make a long listing with ls -l and then look at the list. A better way
would be to pipe the output of the list to a program that searches for a pattern - the grep
(global regular expression) command.
ls -l /tmp | grep Jan (here grep is searching for the pattern Jan)
Grep is a very useful command so you might like to look at its help pages. N.B. grep
supports regular expression searching (not covered here).
The search can be made case-insensitive by using the flag -i.
To only report incidences of a pattern present at the first character in a line, add a ^ symbol
in front of the search pattern
To add line numbers to where it reports a match, use –n
26
To only report the first n instances of a pattern, use –m number where number is the
number of returns you want
Now try each of these flags out, by searching the file cd4_human.pep for the pattern A
To sort all your files by their file size, you could do the following.
ls –l ~ | sort -n –k 5 (sort numerically on 5th field of text)
To find out more about the sort command, you can also read the man page about it
at your leisure.
Simple counting
You can count the number of characters, words, or lines in a text file using the wc command.
Investigate the options by using wc –help
Now try running it with different options on the file cd4_human.pep
Running a process in the background
Some processes can take a while to run. You may not want to have these things running on
your screen. Fear not, you can place such processes in the “background” in two simple
steps, when it is already running
<Ctrl> z this command suspends the current process
bg this starts the process running again, but in the background
Alternatively, you can start the program in the background, by adding an & symbol after the
program name. You can now work on other things, or logout, leaving the background
process running. Typing
fg
will bring it back to the foreground if you want to continue working interactively with this
process. If you try to logout and get prompted that there are suspended jobs, chances are
you’ve left a job suspended and forgotten about it. To see all your suspended jobs, and jobs
running in the background type
jobs
Now try this out. Look at the contents of a large file using more (e.g. more
unknown.tfa), then suspend the more process, put it in the background, run jobs to check
it is still running. Finally type fg to bring it back to the foreground. Quit more as usual, e.g. by
typing q.
NOTE if you had more than one job running in the background, fg will work on the newest
one. Each background job gets assigned a number, visible when you type jobs. To call
another job back to the foreground type % followed by the number of the job you are
interested in
e.g.
sarahb@codon [course] jobs
[1] - Suspended (tty output) less unknown.tfa
27
[2] + Suspended (tty output) vi eric
sarahb@codon [course] %2
this brings the second job – in this case vi back to the foreground so you can interact with it.
A job can be executed directly in the background by appending & to the end of the
commandline. Try typing
less unknown.tfa &
now bring the job back to the foreground and quit less.
History You can call back and edit commands you have previously given, by using the arrow
direction keys. You can also look at old commands by typing history which returns a
numbered list in order, with the most recent commands at the bottom. You can recall a
specific one by typing an exclamation mark followed by the job number shown by history
!32 reruns command 32
!em reruns the last command that started with em
N.B to bring the cursor back to the beginning of a line of command you are in the middle of
typing, or one you have just recalled as above use <Ctrl> A.
Now take a look at your command history. Choose a command that you would like to re-run
and rerun it by using its number.
Finding the Jobs you are running Sometimes you want to find how your jobs are running – and perhaps you would like to
terminate one before it finishes. You can view the jobs you have currently running by typing
ps
this will show some but not all the processes on the machine that currently belong to you.
This will return the name of the program that is running together with the command line
options used to run it, and information on the resources being used by the job and how long
it has been running. Each job or process is given a unique PID which you can use to refer to
it. A more useful version of this command is to use
ps –ef | grep username
if you add your username here, this will show every single process belonging to you. You
can terminate or ‘kill’ your own processes if you know their PID using the kill command.
Please note that if you kill the wrong processes, you may log yourself out unexpectedly! You
can only kill processes belonging to yourself. First use a ps to determine the PID of the
process you wish to kill and then type
kill XXXX (where XXXX is the PID of the process you wish to terminate).
28
Now start xeyes running in the background by typing xeyes &
Now try and find the xeyes process in the process table as above, and kill it. Your xeyes
should disappear.
Some processes may take a short while to die. There are more imperative ways of killing
jobs but we won’t cover them here. To view the most CPU and memory intensive jobs
running on the machine you can use the command
top
This will list the top resourced processes on the machine, as well as the resources they are
using, their ownership, and the overall load on the machine. After refreshing the list a few
times, top will quit. For more information, try using man top.
When you feel you have seen enough and you are ready to log out of any terminal windows
windows you have open, you can close them and logout by typing exit or <Ctrl> d in each
one.
Summary of useful UNIX commands Cntrl-c Stop a process
Cntrl-z Suspend a process, see also jobs, fg and bg.
bg To send a suspended job to the background
cat Type file contents to the screen all at once (see also more)
cat To concatenate files together (cat file1 file2 file3 > newfile)
cd Change directory (cd subdirectory)
chmod To change the permissions or protection on a file (chmod a+r somefile)
cp Copy a file (cp filename1 filename2)
cp Copy a file to a directory (cp filename directoryName)
emacs, xemacs A text editor, more powerful than pico, but more complex.
fg Brings a suspended or background job to the foreground
finger or f To find more information about a user or users, try finger jbloggs
grep To search for files containing a pattern. To search files for the word ATWH (grep
ATWH filename)
history To list the last 50 commands you have entered
jobs Lists any suspended or background processes that you might have
kill To stop or kill a running process where 23459 is your process ID. (see also top and ps)
exit How to exit from the machine, if you get a message about suspend jobs then type it
twice
ls List the files in your directory
ls -l List the files in your directory but with “longer” information
man command For help about UNIX command command
29
man -k keyword Lists all UNIX commands that mention the word “keyword”
mkdir make a directory
more Type a file to the screen a page at a time (press q to quit, space bar for next page).
mv move a file into a directory (mv filename directoryName)
mv Rename a file (mv oldname newname)
passwd To change your password
pico A file editor (pico filename)
pwd Print the full path of your current directory
ps List your current processes
quota To show your disk space quota and current use.
rm Delete a file (rm filename)
rmdir Delete a directory (directory must be empty)
top To see who is hogging all the CPU time
who To list users currently logged on