basics of r – exercises · basics of r – exercises read the instructions closely! lines...

21
Basics of R – Exercises Read the instructions closely! Lines starting with ”>” contain R codes, and they should be written without the ”>” sign. Codes and R outputs are typesetted with Courier font to separate them from normal text. This exercise has been written so that you should test every command, and see what they do yourself. If you need help, just ask! The dataset consists of 17 bioinformatics students, who have given their height and shoe size measurements for teaching purposes. 1. Reading the data into R There is folder data on your desktop. In its subfolder students, there is a file students.txt. Open it in Excel, and check what columns it contains. Note the column headers. You can leave the file open in Excel in case you need to see it later; else just close it. Open R by double-clicking on its icon on the desktop. Go to the menu File, and select option Change Dir. Change the directory to the directory where students.txt file is located. Read the data into an R object named as students (data is in a tab-delimited text file having a title for every colum): > students<-read.table(“students.txt”, header=T, sep=”\t”) Check that R read the file correctly (objects can be printed just by typing their name): > students height shoesize gender population 1 181 44 male kuopio 2 160 38 female kuopio 3 174 42 female kuopio 4 170 43 male kuopio 5 172 43 male kuopio 6 165 39 female kuopio 7 161 38 female kuopio 8 167 38 female tampere 9 164 39 female tampere 10 166 38 female tampere 11 162 37 female tampere 12 158 36 female tampere 13 175 42 male tampere 14 181 44 male tampere 15 180 43 male tampere 16 177 43 male tampere 17 173 41 male tampere You can also print the column headers only (sometimes the whole table does not fit on the screen, and this might be more helpful):

Upload: truongnhu

Post on 01-May-2018

235 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Basics of R – Exercises · Basics of R – Exercises Read the instructions closely! Lines starting with ”>” contain R codes, and they should be written without the ”>” sign

Basics of R – Exercises Read the instructions closely! Lines starting with ”>” contain R codes, and they should be written without the ”>” sign. Codes and R outputs are typesetted with Courier font to separate them from normal text. This exercise has been written so that you should test every command, and see what they do yourself. If you need help, just ask! The dataset consists of 17 bioinformatics students, who have given their height and shoe size measurements for teaching purposes. 1. Reading the data into R There is folder data on your desktop. In its subfolder students, there is a file students.txt. Open it in Excel, and check what columns it contains. Note the column headers. You can leave the file open in Excel in case you need to see it later; else just close it. Open R by double-clicking on its icon on the desktop. Go to the menu File, and select option Change Dir. Change the directory to the directory where students.txt file is located. Read the data into an R object named as students (data is in a tab-delimited text file having a title for every colum): > students<-read.table(“students.txt”, header=T, sep=”\t”) Check that R read the file correctly (objects can be printed just by typing their name): > students height shoesize gender population 1 181 44 male kuopio 2 160 38 female kuopio 3 174 42 female kuopio 4 170 43 male kuopio 5 172 43 male kuopio 6 165 39 female kuopio 7 161 38 female kuopio 8 167 38 female tampere 9 164 39 female tampere 10 166 38 female tampere 11 162 37 female tampere 12 158 36 female tampere 13 175 42 male tampere 14 181 44 male tampere 15 180 43 male tampere 16 177 43 male tampere 17 173 41 male tampere

You can also print the column headers only (sometimes the whole table does not fit on the screen, and this might be more helpful):

Page 2: Basics of R – Exercises · Basics of R – Exercises Read the instructions closely! Lines starting with ”>” contain R codes, and they should be written without the ”>” sign

> names(students) [1] "height" "shoesize" "gender" "population"

2. Prepare the data for analysis Individual columns can be called using the following syntax: first comes the name of the object, followed by a dollor sign, after which comes the name of the column: > students$height 3. Simple statistics What is mean height and shoesize? Type: > mean(students$height) [1] 169.7647 > mean(students$shoesize) [1] 40.47059

What about standard deviations? Type: > sd(students$height) [1] 7.578996 > sd(students$shoesize) [1] 2.695312

What are the gender and sampling site distribution (how many observations are in each groups)? Type: > table(students$gender) gender female male 9 8 > table(students$population) population kuopio tampere 7 10

Command table can also be used for cross-tabulations: > table(students$gender,studenbts$population) population gender kuopio tampere female 4 5 male 3 5

Page 3: Basics of R – Exercises · Basics of R – Exercises Read the instructions closely! Lines starting with ”>” contain R codes, and they should be written without the ”>” sign

4. Useful plots Usually graphical inspectation gives an easier interpretation. How are heights distributed? To use a histogram, type: > hist(students$height)

That’s the distribution for the whole population. But, is there is a difference in heights between the sampling sites? That can be studied using a box plot. In this case variable height is divided into two groups using the variable gender, and a separate boxplot is produced for both of these plots: > boxplot(students$height~ students$gender)

So, there is large difference between the genders in heights. Does the same apply for sampling sites? Write the code for this yourself.

Page 4: Basics of R – Exercises · Basics of R – Exercises Read the instructions closely! Lines starting with ”>” contain R codes, and they should be written without the ”>” sign

How are height and shoe size related? You can get a graphical view of this by making a scatter plot: > plot(students$height, students$shoesize)

5. Recoding variables What if we want to differentiate between males and females in the plot? Let’s use different plotting symbols for males and females. First, we need a vector of plotting symbols. Let’s plot females with F and males with M. The new vector can be produced by the command ifelse: > sym<-ifelse(students$gender==”male”, “M”, “F”) Now plot the image again: > plot(students$height, students$shoesize, pch=sym) Check from the help file what are the arguments for ifelse command. We can even represent different populations with colors. Let’s recode the population variable with color names (Kuopio=blue Tampepre=red): > cols<-ifelse(students$population==”kuopio”, “Blue”, “Red”) Now plot the image again: > plot(students$height, students$shoesize, pch=sym, col=cols) There are only 16 symbols on the plot. Can you figure out where one has vanished?

Page 5: Basics of R – Exercises · Basics of R – Exercises Read the instructions closely! Lines starting with ”>” contain R codes, and they should be written without the ”>” sign

6. Making a new dataset Make a new dataset from the variables height, shoesize, sym and cols: > students.new<-data.frame(students$height, students$shoesize, sym, cols) Check that the new dataset is OK: > students.new students.height students.shoesize sym cols 1 181 44 M Blue 2 160 38 F Blue 3 174 42 F Blue 4 170 43 M Blue 5 172 43 M Blue 6 165 39 F Blue 7 161 38 F Blue 8 167 38 F Red 9 164 39 F Red 10 166 38 F Red 11 162 37 F Red 12 158 36 F Red 13 175 42 M Red 14 181 44 M Red 15 180 43 M Red 16 177 43 M Red 17 173 41 M Red > class(students.new) [1] "data.frame" 7. Extracting a subset from a dataset Make two subsets of the dataset students. Split it in two according to gender. First, check which individuals are males: > which(students$gender==”male”) [1] 1 4 5 13 14 15 16 17 Based on that use subscripts to select the correct subset (take only rows for which gender is male): > students.male<-students[which(students$gender==”male”),] height shoesize gender population 1 181 44 male kuopio 4 170 43 male kuopio 5 172 43 male kuopio 13 175 42 male tampere 14 181 44 male tampere 15 180 43 male tampere 16 177 43 male tampere 17 173 41 male tampere

Page 6: Basics of R – Exercises · Basics of R – Exercises Read the instructions closely! Lines starting with ”>” contain R codes, and they should be written without the ”>” sign

Similarly, make a new dataset from females. Sometimes we want to split the dataset using some continuos variable, such as height. Typically the median of the variable is used. Make two new datasets that containg individuals below and above the median height: > median(students$height) [1] 170 > students.short<- students[which(students$height<= median(students$height)),] > students.short height shoesize gender population 2 160 38 female kuopio 4 170 43 male kuopio 6 165 39 female kuopio 7 161 38 female kuopio 8 167 38 female tampere 9 164 39 female tampere 10 166 38 female tampere 11 162 37 female tampere 12 158 36 female tampere

Similarly, make a new dataset from long students. 8. Quit R To quit R, type: > q( ) R then asks you whether you would like to save the workspace or not. This is generally a good idea, and answer the question “yes”. You can then get back to the same analysis just by double-clicking on the .Rdata-icon in your students-folder. If double-clicking does not works, you can start R, and use menu choise File->Load Workspace and File->Load History to acquire the same result.

Page 7: Basics of R – Exercises · Basics of R – Exercises Read the instructions closely! Lines starting with ”>” contain R codes, and they should be written without the ”>” sign

AFFYMETRIX PREPROCESSING EXERCISE

PRELIMINARY OPERATIONS

A. Start RGui;

B. Change the working directory shoosing the folder where thr CEL files and the PHENODATA are

located;

C. Load the needed libraries: library(affy) library(affyQCReport) library(hs133ahsentrezgcdf) library(hgu133aEG1000)

In this exercise we will perform the following tasks:

1. Importing the CEL files and PHENODATA;

2. Performing basic QC;

3. Preprocessing the data.

1. IMPORTING THE CEL FILES AND PHENODATA

First, we create a new AffyBatch object (dat) where we import the CEL files. dat <- ReadAffy()

The PHENODATA contains the information about the samples (i.e. the microarrays) of our dataset. The

PHENODATA are usually stored into a table (TAB-delimited .txt file) where each row represents a sample

and each column represents a variable.

We create a data.frame object to import the PHENODATA text file

pd <- read.table("phenod.txt", header=T, row.names=1, sep="\t")

Then, we assign the pd data.frame as the PHENODATA of the dat AffyBatch pData(dat) <- pd

2. PERFORMING BASIC QC

We use the affyQCReport library that we loaded before QCReport(dat)

This creates a PDF file with some plots. Read the affyQCReport vignette for interpreting the different

plots. Note that the plots in page 2 can be also produced by the commands boxplot(dat)

and hist(dat)

Additionally, we can check the RNA quality by using the AffyRNAdeg function. deg <- AffyRNAdeg(dat)

Individual probes in a probeset are ordered by location relative to the 5! end of the targeted RNA molecule.

Since RNA degradation typically starts from the 5! end of the molecule, we would expect probe intensities

to be systematically lowered at that end of a probeset when compared to the 3! end. On each chip, probe

intensities are averaged by location in probeset, with the average taken over probesets.

Page 8: Basics of R – Exercises · Basics of R – Exercises Read the instructions closely! Lines starting with ”>” contain R codes, and they should be written without the ”>” sign

We can plot the results into a new PDF file:

a) we create a new PDF graphic device. this will direct everything we plot into the new PDF file pdf("rnadeg.pdf")

b) we plot the RNAdeg results plotAffyRNAdeg(deg, cols=1:17)

c) we add a legend to the plot legend(1,60,legend=rownames(pd), text.col=1:17)

d) we close the graphic device dev.off()

3. PROPROCESSING THE DATA

We want to use the re-annotation of the affymetrix probes of the hgu133a chipset according to the Entrez

Gene database.

For doing so, we instruct R to use the new CDF package with the re-annotated information dat@cdfName <- "hs133ahsentrezgcdf"

We also instruct R to use the meta-annotation package dat@annotation <- "hgu133aEG800"

We preprocess the data using the RMA algorithm. The new datrma object is of class ExpressionSet datrma <- rma(dat)

Finally, we save the normalized expression values into a TAB-delimited text file (.txt) write.exprs(datrma, "datexprs.txt", sep="\t")

FINAL OPERATIONS

A. Save the workspace save.image("Affy_Preprocessing.Rdata")

B. Save the history savehistory("Affy_Preprocessing.Rhistory")

Dario Greco

Institute of Biotechnology - University of Helsinki

Building Cultivator II, room 223b

P.O.Box 56 Viikinkaari 4

FIN-00014 Finland

Office: +358 9 191 58951

Fax: +358 9 191 58952

Mobile: +358 44 023 5780

Email: [email protected]

Page 9: Basics of R – Exercises · Basics of R – Exercises Read the instructions closely! Lines starting with ”>” contain R codes, and they should be written without the ”>” sign

����������� ���������� ���������������� "!���#%$%$%& '(���)���� *,+��.-���

/103254603798:2<;=;?>�@BAC8(DFEHG IJ>�8:0�7LK�0=03KNMPOBKRQ=K

S����.�T��UV�.W���X�+ZY�+������X�+%��[\+]����+^U_� �T��U`���,a�+����.�bW���[c���,d���+�d�����+�X�Xe�����f�g[��T�.Wih�j�klk?mgd?� �n%��U_+%oLpq� d����X����+%r

s ot�����f�u�T*ud�� �.�#�ot���v��U%���%X��.��wd�a�� ��Xx o���� �.*g��aT�Tyz���.��� �

{t| DFK�Q=KJ>fI}4V7�0=Q

~(�T��X.����X.�f���.�L�������"�.W�+����f��n_+6h�j�klk?m]d?� fn ��U_+^�T�l���u��X�+e�) "X�+�a�+���.�T��U,�T�R����� *�*,+��l��� �L�� d��T��U

� h�j��l��m%�l�<�fh�j�klk?m��

pRW�+��^fW?����U_+N�.W�+<[�� �.n)�T��U����T��+����� �. 9[�W�+���+<�.W�+N�����f�R��X�a�������+��6�����w��+�� �9�.W�+5��a�+L����d��%����p:���.U_+���X�o ��Y��z�[�W����Wb�� �l�f���T��XL�T����� �.*g���.��� ������� ���t�.W�+9�����f� Xf+���r

��� m%�)�� ����� ���)m%�_��m ��� �?� ���� l¡�¢  =£z��m%�)��� ���5¤¥�)¦l� �_�

§¨+���©ªXLa��)� n`[�W?���t�.W�+e-%���.�v����a�+ � m%�)�� ��� �� �l�f���T��X�r

��� m%�)�� ���« h�j���� ¬)­%k��� �®¬�m�k��°¯�j%hl�%¬�m�k?��±_�² ±_��³

´µ µ ´µ ´�µ ¤�� ¡�¢ �°¶ �%·°¸=j%h%� � �)¡��´ ¹ ¹ ´ ¹ ´%¹ ¤�� ¡�¢ �°¶ �%·°¸=j%h%� � �)¡��´ ² ² ´ ² ´ ² ¤�� ¡�¢ �°¶ �%·°¸=j%h%� � �)¡��´�º º ´�º ´�º ¤�� ¡�¢ �°¶ �%·°¸=j%h%� � �)¡��´ ³ ³ ´ ³ ´ ³ ¤�� ¡�¢ �°¶ �%·°¸=j%h%� � �)¡��´ » » ´ » ´%» ¤�� ¡�¢ �°¶ �%·°¸=j%h%� � �)¡��´�¼ ¼ ´�¼ ´ ¼ ¤�� ¡�¢ �°¶ �%·°¸=j%h%� � �)¡��´�½ ½ ´�½ ´ ½ ¤�� ¡�¢ �°¶ �%·°¸=j%h%� � �)¡��¾ µ ¿ ¾ µ ¾ µ ¤�� ¡�¢ �°¶ �%·  )¡�¢% ?£ÁÀ¾ ¹ µzà ¾ ¹ ¾ ¹ ¤�� ¡�¢ �°¶ �%·  )¡�¢% ?£ÁÀ¾ ² µlµ ¾ ² ¾ ² ¤�� ¡�¢ �°¶ �%·  )¡�¢% ?£ÁÀ¾ º µ�¹ ¾ º ¾ º ¤�� ¡�¢ �°¶ �%·  )¡�¢% ?£ÁÀ¾ ³ µ ² ¾ ³ ¾ ³ ¤�� ¡�¢ �°¶ �%·  )¡�¢% ?£ÁÀ¾ » µ�º ¾ » ¾ » ¤�� ¡�¢ �°¶ �%·  )¡�¢% ?£ÁÀ¾ ¼ µ ³ ¾ ¼ ¾ ¼ ¤�� ¡�¢ �°¶ �%·  )¡�¢% ?£ÁÀ¾ ½ µ�» ¾ ½ ¾ ½ ¤�� ¡�¢ �°¶ �%·  )¡�¢% ?£ÁÀÂ

pRW�+���+V����+ s !������f����a�+�X]�����Ä�.W���+�+`���ÆÅ3+���+��)�6Xf��*ud�a�+�XÈÇ��L+Z�.�:[��Ta��Ä�� d3+%�(�����É�Ld3�%����Ê^Ë9ÌZo<��� aT��*u��X�\ x �����Í�\ �ÎB��+�aTaq[�W����WÍXÏ��*ud�a�+�X�W?��-_+^��+�+��"av���3+�a�+��"�l È[�W���fW��� _+�Xt�T��+�� fWÍ���.�f�� _o

Page 10: Basics of R – Exercises · Basics of R – Exercises Read the instructions closely! Lines starting with ”>” contain R codes, and they should be written without the ”>” sign

S9���f�b��a�+�Xg� ��� * �.W�+��T*g��U_+Í���?��aT �X.��X,d���� U%�f��* ����+���+�� �°������X.��� ��+������i-%���.�v����a�+��� �) Á�� *u*g��������)m%� ¤ k�m�j�k?m%�� � o"� � �.W�+,��� aTa��z[��T��U��<[\+g��+�� � �.W�+V��+Z� ����aT�]�� aT��*u��X"Ç ��� ��+ � �?� fn)U%��� ����� -%��aT��+�Xw����� *�L+�������������+�+����W?������+�a�XZÌ������"�T�Í� �����T�.��� �Í��a�Xf�,�� aT��*u� © �?� ��X.d3� ��©Tr

� �)� � ���)m%� ¤ k�m�j�k�m_�� � � � m ��� �?��� ¯�j h)� ¬�m�k���� � ¢ ­l� ´ � � � � ¡�¢ � ����¢ � � � ¤ ´ ¢lh ­_k�� ��� ����m_� � ¡�¢ � �%�

�)�T���+t� ���N�����f�e��X5�T�V�)d3� �<� � �.*g�����%[\+tW?��-_+tX.d3+���Æ�?+�� � ¢%­l� ´ � � � � ¡�¢ � �%o5~�� � �e+���+�'N�ÆYu�����f�uÇ�o U%d��N��a�+�XÏÌ[\+w[\� ��a�����Xf+ � ¢ ­)� ´ � � �������� ¡=j ¦ �%o� � h�j�klklk?m"�T�6��X9d3�%X�X.�T��a�+,����U%�T-_+g���ÆÅ�+���+��l�B[\+��TU%Wl��XB���"�.W�+VX.d3� ��X�oV�� ��.W���Xw[�+g���� +%o U�oV�TU%��� ��+g�.W�++ZÅ�+���L���(�?� ��X.d�� ��XzoN���"�.W�+9��� aTa��z[��T��Ug[\+^U%�T-_+9[\+��TU%W)��$]���V��aTa¨�.W��%X�+9X.d3� ��X�[�W��%X�+9-%��aT��+9��X s �T�������f��� aT��*u� © �?� ��X.d3� ��©To

� �)� � ¸���j�� l����� k�m � �=j ¦ � µ ���l��¢ ¸<�.�)� �%¶ ����� ´ ¢lh:���)� � ¶ �l�� �)� � ¸���j�� l����� �� � ¢ �� � � � ����m%� � ¡�¢ � � �)� µ�� � Ã

�)d3� �R��a�+�XL���]��� �L�� �)�f���T��U_+���+9�?��*,+�X����������.�f�� Èav�� _� �����T����� �.*g���.��� �¨o\pRW�+�X�+^����+e��+�� ��������Xf+����l

� �)� � ������ �Á� ���lm%��_ ��(�z�� l¡�¢  ?£ ¤ ��mlh?�_�� �)� � ¡l�=j�� � � � � �� � ��m ��¢%­ � ���)� � ������� � �

�+�+9�.W�+�����X.��#%$]X.d�� ���?��*,+�X������"�.W�+��T�ta������.��� ��Xt� ���.W�+^���.�f�� �r

� �)� � ������ ��� µ���¹là � �

�e|�� Kt8��9A�0�7 �6@BO 8(7�0�0�2<8qQ=>�7L@

!�X.�?��aTaT "a��)���a(�?� �nU%��� �����b�T�l��+���X.�T� Í���\+�� fWiX.d�� ����X�X������.�f� ���+�������� *}�.W�+w� � ��+�U%��� �������T�)��+���X.�T�� _o9� �h�j�klk)k�mg�.W���X���X���� ��+B��X.�T��UÈ�� *u*g�����b��m ´ ¾ �l��¢ ­�)�±)¢ �)��� ´ � o��L*,� ��UÈX�+�-_+��f��a<��aT��+��.�?���.�T-_+�X��3a�+���©ªX���X�+W�+���+e*,+��.W���È[�W�+���+w��� ��+����?� fn)U%��� ������X������.�f� ��.��� �¨�?��aTa3- ��aT��+�X��3+�a���[ $�o Î]����+e��+�d�av� �+��"�) g-%��aT��+w$�o Î�r

� �)�l�#"%$ ��m ´ ¾ �l��¢ ­�)�±)¢ �)��� ´ � �.�)�&�gk?� �� ¢%� � � mlh_·��%�

'w| DF>fKtAt@B7t;�Q=>f8F4)(�7RQ=;

§¨+���©ªXLa��)� n`�.W�+9����X.�.�.�T�����.��� �Í���5�?� �nU%��� ������X������.�f� ���+���X.�TU%�?��a�-%��aT��+�X��T����+���fW?������+�a¨����� * ���.�f�� s r

�* j ��� �.�)�l� � ¶+� � µ,� �

-���[ ���+�X��.W�+9X.W?��d3+w���5�.W�+9����X.�.�.�T�����.��� �ÍfW?����U_+6������+��L��d�d�aT )�T��U,a�� Ul���.�T�.W�*u��^�.�f����X�� � �.*/.

�* j ��� �Ïhl¢_� ¹ �.�)�l� � ¶+� � µ,� �)�

�)*,�� �.W�+��t��+�d���+�X�+��l�f���.��� �����5W���X.��� U%�f��*���Xt� ���f���T��+����l `��X.�T��Ug�� *u*g�����

� ¡�h)¢ � ������ � j � �<�fhl¢_� ¹ ���)�l� �%¶0� � µ,� �l�

Page 11: Basics of R – Exercises · Basics of R – Exercises Read the instructions closely! Lines starting with ”>” contain R codes, and they should be written without the ”>” sign

§¨+���©ªX��� *ud?����+9�.W�+wX�����.��+��td�a�� �����(a�� U?Ç¥�\ x Ì�-�XRa�� U?Ç¥�\ �Î_Ì��������Í��d�a�� ��o

� hl¢_��� � hl¢%� ¹ �.�)�_� � � � � µ�� �� hl¢_� ¶ � hl¢%� ¹ �.�)�_� � ¶+� � µ�� �� ¡�h)¢ � �Ïhl¢_������hl¢_� ¶ ��k�m�j�� � ��� j � j �®� ´ m �)� �%� ¡�hl¢ � �%�

� ��� ����+��t���,X�+�+e� [\�ud�a�� ��X�X.�T*^��aT�f����+�� ��X.aT _�=� d�+��b�B��+�[ [��T������[wr

��¦ µ)µ ���

� ¡�h)¢ � �)�fh)¢%�����h)¢%� ¶ ��� ¹ �.h)¢%� ¶ $�h)¢%����Vk�m�j � � ��� j � j ���   ¡�hl¢ � �%�

���: _� ��[R���l�L���,d�a�� ���T���.W�+w� �.�TU%�T�?��a¨[��T�����z[ Ç #_Ì���Ul���T�¨��[��.�T��+%r

� ���� ¤.� � � � ¹ �

'<��+�-��� ��X.aT _��[�+u��X�+���X.�T*ud�a�+]a�����a5X�d�� �e�?� �nU%��� ����� X������.�f� ��.��� �i*,+��.W��)��Ç�����t�Ä����� ������=ÌZoBpRW�+a�+�-_+�aq���5�?� �nU%��� �������T�)��+���X.�T�� �����"�3+e�T�l-_+�X.�.�TUl����+��1Ç ��X�+e���z[ �.U,�T��X.��+�� �����5�.U%�=ÌZr

� j�k?m%��%¡�h)¢ � �Ïhl¢_� ¹ �.�)� � ¶ � � � µ,� ���`�� � ¡l�?j � � � �&�Íh)¢�¸ � �Z¸ j � �?� � j�� )� �����_�������h�j�k � ´ � ¼ � µ�¹ �)�� j�k?m%��%¡�h)¢ � �Ïhl¢_� ¹ �.�)� � �_� � � µ,� ���`�� � ¡l�?j � � � �&�Íh)¢�¸ � �Z¸ j � �?� � j�� )� ���_���l���������h�j�k � ´ � ¼ � µ�¹ �)�

| E 7�0�IJK (�>��(KRQ=>�7L@

�)�T���+È[\+�����+È�T�)��+���+�X.��+��Á�T�Á�f���.���%X,���w�\ �Î��������\ x �W?������+�a�X���a�+���©ªXu�� *ud�����+"a�� Ui�f���.���%XB����� *P�.W�+�?� �nU%��� ������X������.�f� ���+��������f��r

� k�m � ��¢%�%k�m)h�j������j � j��) l�)��m � � �.�)�_� �¥k?� �� ¢%� � ��¡��%�

���.U%��*,+��l�9k�� � ¢_���������+]+%o U�o`�Zk��_��j m��3�_� �zhl¢)� �)� �%�3� �V��¡l�?j � �)� jz¡�hl¢)� �l� �%o��i�.�T�.�T��U�� ��aT Í�.W�+6����X.�a�+��.��+��^�����.W�+]*,+��.W��)����Xw+���� ��U%W¨�:+%o U�o���¡3�uX.�f������Xe� � �V��¡l�=j�� �l� j�¡�h)¢l� �)� �_o6§¨� U"�f���.���������f�`[��T�.W�� ������l i��� �.*g��aT�Tyz���.��� �1��XB� ���f���T��+�� [��T�.WÄk�� �� ¢_� � � ��¢����=�%o"�L+�X���aT�^���t��� �.*g��aT�Tyz���.��� �Á���� �3+,X�+�+�� �T�1��b� d�a�� ��r

� ¡�h)¢ ���  5��k�m ��m �)��m � � µ � ¦ h�j�k � ´ � ¼ � µ�» �)�

p¨�. "¡�hl¢ ��� �=j�� � �?j�¡��¢l� �)� � � ����� �.W"������� �.*g��aT�Ty�+��������"��� �.*g��aT�Ty�+��Í�����f��r

� k�m à � ��¢ �%k?mlh�j�����=j �� j �l )�l��m%� � ���)�l�&��k�� � ¢_� � � ��¢����?�_�� ¡�h)¢ ��� �?j � � �=jz¡��¢l� �l� � k�m à ��m �l��m � � µ ��k�m�j�� � �����%·¢%����¬�¢%�%k�m)h�j��m � j ¢��3�%�� ¡�h)¢ ��� �?j � � �=jz¡��¢l� �l� � k�m��.m%�l��m%� � µ �¥k?m�j � � �� )· � � � ¬�¢%�%k?mlh�j��m � j ¢��3�_�

� ���.�v�����+�XL�������+e��� �.*g��aT�Ty�+�����X.�T��U

� k�m ¹ � ��¢ �%k?mlh�j������� � ¸��)���) l�l��m � � � k?m ��k�� � ¢_� � � � ´ mlhl�=�%�

�+�+9�.W�+9+Z�+����r

� ��¢ ¦ ¡�h)¢ � � k�m ����� ´ ¢)hq��k�m ��� ������m�k�� ��� ´ ¢lh���m�k?� � � k�m ��� �l�� ��¢ ¦ ¡�h)¢ � � k�m ¹ ����� ´ ¢lh:� k�m ¹ ��� � ����m�k�� � � ´ ¢lh���m�k?� � ��k�m ¹ ��� �l�

Page 12: Basics of R – Exercises · Basics of R – Exercises Read the instructions closely! Lines starting with ”>” contain R codes, and they should be written without the ”>” sign

����������� ���������� ���������������� "!���#%$%$%& '(���)���� *,+��.-���

/10�24350�2567340�8:9<;=9>2@?�0�ACB�B.DE9(FHG5;=9>IJIJ9>3K6C9>249>I

L����.�M��NO�.P���Q�+SR�+������QT+%��UV+W����+HNX� �M��NO���5Y�+����.�"P���UZ���4[����"���]\^+���+��_�.�`��YMYM "+SR�a���+�Q�Q�+��bNX+���+�Q�U��M�.Pdc�e�f_fJgaJ� �h%��NX+%iHjkP�+l�����T�,��Qnmo��� *qp�!r*O������X���.�T�� �QHU�P�+���+5�.P�+4�nsntumo��� *v&:tna��%t�wyxyh����hX� ����*O���+5������&��� �.*r��Y=*O���+�PJ��-X+�z�+�+��{�� *OaJ����+��{U��M�.Pb��+Smo+���+����+H�@s�t5i_jkP�+�NX�X��Y|��Q����}[����{� ���VP��~U��.P�+nh���)ThX� �����mntna��%t�wHNX+���+"��\=+����Q5�.P�+"� ��.�M-�M�� ���m�� �.P�+��5NX+���+�Q~i1LH+��T���MY�QO��m@�.P�+b+SR�a�+��.�M*,+��)�,�����z=+rmo� �������M��.P�+����.�.���Y�+%�d�k��YMY��~UW����i>��iM�VLn����� �M���C��iM�k�H� ��N��V�@i>��iM�k�)a=+�+��^��j�i>'�iM�k�������C��z��M���V�Ci���i���#%$%$%$X�Si�d������X���.�T�� �+SR�a���+�Q�Q.��� ��a�����[�YM�M��Nd����+��_�.�][J+�Q5NX+���+�QWU��M�.P���YM��+���+���+SR�a���+�Q�Q.��� ���M���nLH����+S[J���+��_�5*O���+%i�@�S�|�%�,�H�C�S���.� �.�S �¡�¢ ��#%$%#%#�x£#%$%#%¤¥�y�k��YMY��~U@#%$%$%$�i a��mV�M�������T�O���M��+����� �. ��Si

¦¨§ ;=9>B�0T©v0�25A�;=0�9>I

�)�T���.�@�u�����{�T��hX+4c�e�f_f�gOaJ� �h%��NX+W�M�)���O��Q�+

ª c�e�«_¬�g%¬_­>®Tc�e�f_fJg�¯

�VPJ����NX+��.P�+�U�� �.h)�M��Nb���M��+����� �. ������"��+�� �{�.P�+��T���.NX+���Q�[�Y�+%�

ª±° g%¬)²³ °�´�µ ¬�³)g%¶X·�g ¬²³ °J´ ®�¸�¹_º�» ¹|¼~·�g%¬)²�³ °�´<½y°)¾_° ¸X¯

jkP�+�����+�� �"�.P�+}�����T���

ª ¬)² µ ¬�³)g%¶ ½ f�g�e�f�gX²³ ´ ® ° g ¬²³ °J´%¿XÀ e c)³ Á�g�f�³< ´ » Ã_¬|Ä ³ µ ¸ ´ º�» ° ¸�Â�» °)Å ³ ¬ ½ Ä%»_c ÃXfÆ ´�µ ¸�«�gX¶ ´ º�» ° ¸%¯

s�+SR�Ç+�YM�M*O�M�J����+n�.P�+@Q�a�� ��QÇU�P����PbPJ��-X+Cz=+�+��r*r���.hX+��rzJ� �d�o�� YM��*O�ÉÈSzJ� ��Q.a=� �~È�PJ� QV- ��YM��+Wp~�>z_ ON%�M-�M��N�.P�+�*ÊU�+��MN%P_��$��

ª ¬)² ¿�Ë ³�e�² Å_°�´�µ f�g ° ¬|e ¾ ®~Ì=ÂÍÆ_¬�» Ë ®.¬)² ¿%Î ¯�£ÆJÄ%»_c�®Ï¬)² ¿ Î ¯_¯ª ¬)² ¿�Ë ³�e�² Å_°�´>Ð ¬² ¿ » °_Å ³ ¬ ¿ ¸�«�g%¶ ´ º�» ° ¸ µ)µ Ì~Ñ µÓÒ

Ô(�M�J��YMYM {��+�� �"NX+���+H�J��*,+�Qn�����¥���.�T�� {Y`�� X� ���@�M��m�� �.*r���.��� ���

ª ¬)² ¿ ²³%Æ�³ ´Õµ ¬�³_g%¶)ÖX¹×(®~¸�¹_º�» ¹J¼ ½ ²�g_cJ¸X¯ª ¬)² ¿ º_¬|e~Æ ° ³ ¬ µ ²³ ° ×�g ­�»%à ° ®Ï¬)² ¿ ²�³ Æ�³ ´ ¯

��+���ØÙQÚ��� �.*r��YM�MÛ�+>�.P�+������T�kU��M�.PWa��.�M�)�Ïxy�.�Ma}Y��+�Q�Q=��� �.*r��YM�MÛ~���.��� �"�o��+Sm�����YM��zJ� Th)N%��� �����4�� �.��+���.��� �}*,+��.P�����Q���Q�+��J�S�

ª f�g µ Æ�»%¬%f�g)c�e�ܳ�Ý�e °)Å e~Æ)¹_¬)¬�g ­ ´ ®.¬)²>Â�f�³ °_Å »X¶ µ ¸�º=¸X¯

Page 13: Basics of R – Exercises · Basics of R – Exercises Read the instructions closely! Lines starting with ”>” contain R codes, and they should be written without the ”>” sign

����� 9>I|0�6C2K©vA�?J;�0.F

��+�U�� ��Y��"YM�MhX+����O���Y����Y`����+W�.P�+HNX+���+}+SRa���+�Q�Q���� ����m(h���)ThX� ���C*O���+���+�Y`���.�M-X+����,U��MY��{�� a�+�*O���+%iÇÔ�� ��.P���Q�a����.a=�%Q�+%��Y�+���ØÙQC��+S[���+W�.P�+Hmo� YMY��~U��M��N�� �S��� �d�,� �o����� �

ª ¶³ ´ e�²_Æ µ Ä�«Je�Æ)¶(®�¸SÝ�e c%¶�� Î ³ ��¸ µ Ì^Â�¸�������Ý|e cX¶�¸ µ Ä=®.¬�³%º<® Ò Â��J¯^£¬�³ º>®�Ì=Â���¯)¯_¯

LH+�Q.�MN%�¥*r���.�.�]Rd��YMY��~UCQ���QC���r+�Q��.�M*r����+}�.P�+}- ��YM��+�Q���m�� � �.� �r�����S� � i@w£���.P���QnUV� �.h"U�+5����+}�M�_��+���+�Q.��+��d�M��.P�+�NX+���+W+SR�a���+�Q�Q.��� �d���]\=+���+����+�Q@z�+��£UV+�+��¥h����hX� ����������U��MY��"�� a�+W*O���+r� aJ���T��*,+���+����! @x"���MY��J�SiVw£�� ����+������}+�Q.�.�M*r����+n�M�ÇU�+n��+S[���+H��Y�Q��4����� �.P�+���aJ���T��*,+���+��W�#� �MY��x��n+SmS�Si%$Ç� �:����{Q�+�+�U�PJ�����.P�+@-%���.�`��z�Y�+¶³ ´ e�²XÆ:�� �)�T���M��Q@��� NX+��.P�+��@U��M�.P�- ���.�`��z�Y�+ ° g ¬²³ °�´ z) {�� a��M��N

ª Ä~«|e~Æ)¶<®.¶�³ ´ e�²XÆÇ ° g%¬)²�³ °�´ ¯Ý�e c%¶�� Î ³ �&�'�(��Ý|e%c%¶*)_c�e�¶�³ Á_ÃXf«�³ ¬�Á�g�fJ³ À e c)³ Á�g�f�³,+X­'- +X­�.

ÄÌ Ì Ò Ì ÄÌ ÄÌ ½�´ º�» ° Î ³/� Ë e cX¶ ° ­_º�³Ä10 Ì Ò 0 Ä10 Ä10 ½�´ º�» ° Î ³/� Ë e cX¶ ° ­_º�³Ä2- Ì Ò - Ä2- Ä2- ½�´ º�» ° Î ³/� Ë e cX¶ ° ­_º�³Ä43 Ì Ò 3 Ä43 Ä43 ½�´ º�» ° Î ³/� Ë e cX¶ ° ­_º�³Ä1. Ì Ò . Ä1. Ä1. ½�´ º�» ° Î ³/� Ë e cX¶ ° ­_º�³Ä15 Ì Ò 5 Ä15 Ä15 ½�´ º�» ° Î ³/� Ë e cX¶ ° ­_º�³Ä26 Ì Ò 6 Ä26 Ä26 ½�´ º�» ° Î ³/� Ë e cX¶ ° ­_º�³Ä2� Ì Ò � Ä2� Ä2� ½�´ º�» ° Î ³/� Ë e cX¶ ° ­_º�³7 Ì Ì Ì 8 7 Ì 7 Ì ½�´ º�» ° Î ³/� ¹_º�» ¹J¼9���7 0 Ì Ì Ì Ò 7 0 7 0 ½�´ º�» ° Î ³/� ¹_º�» ¹J¼9���7 - Ì Ì Ì_Ì 7 - 7 - ½�´ º�» ° Î ³/� ¹_º�» ¹J¼9���7 3 Ì Ì Ì40 7 3 7 3 ½�´ º�» ° Î ³/� ¹_º�» ¹J¼9���7 . Ì Ì Ì:- 7 . 7 . ½�´ º�» ° Î ³/� ¹_º�» ¹J¼9���7 5 Ì Ì Ì;3 7 5 7 5 ½�´ º�» ° Î ³/� ¹_º�» ¹J¼9���7 6 Ì Ì Ì4. 7 6 7 6 ½�´ º�» ° Î ³/� ¹_º�» ¹J¼9���7 � Ì Ì Ì45 7 � 7 � ½�´ º�» ° Î ³/� ¹_º�» ¹J¼9���

�Ç� TP�����UZ�� �.��+�Q.a=� ����Q����O� ��+W���.�T�� Xi�jkP�+H- ��YM��+�Q�p~� $5�M�"�£UV�,�� YM��*O��Q4È�� �MY��x��n+Sm.ÈH������È��! @x"���MY���È��+�YMY�U�PJ���C�M��m�� �.*r���.��� ��U�+W����¥� z��T���M�"mo��� * +�� �P1���.�T�� XiÇjkP�+�[���Q.�C�� Q�+H��Q��pX�#���MY��x��@+SmS�=< $��>�! @x"���MY��J�@?A���MY��{xÇ�@+SmU�P����P:*,+�����Q��.PJ���Çm ��� * �.P�+�[���Q.�Ç+��MN%P_�k���.�T�� �Q�U�+CNX+����.P�+�Y�� N%�T���.���}z�+��£UV+�+��bU��MY��O�� a=+@�����O��+Sm�+���+����+%iÔ���� *Ê�.P�+WY`� Q.�C+��MN%P_�H���.�T�� �Q�UV+}NX+����pX�#���MY��x��@+SmS�=< pX�>�! @x"���MY��J�@?A���MY��{xÇ�@+SmB<C�! xD���MY��E?C�! xÇ�@+SmjkP�+�aJ���T��*,+���+���Q�����+��.P�+��¥+�Q.�.�M*r����+��{mo��� *Ê�.P�+������T�l��Q��M��N,YM�M��+����C*,���+�Y^[��.�.�M��N��

ª ��e ° µ c�f À e ° ®Íf�g(Â϶³ ´ e�²_Æ|¯

�)�M���+���+�Q.�MN%�Õ*r���.�.�]R��� �)�T���M��+�����U����� YM��*O��Q����.P�+���+{U��MYMYCz=+{��U��ÉaJ���T��*,+���+���Qr+�Q.�.�M*r����+��Õmo� �,+�� TPNX+���+%i�jkP�+�aJ���T��*,+���+��n- ��YM��+�Q�mo� ���.P�+H[���Q���NX+���+}����+%�

ª ��e °�¿ Ä%»_³/� Ð Ì^ÂÍÑÝ|e%c%¶F� Î ³/� �'����Ý|e%c%¶

� Òǽ 5G.H8H.H-H�|Ì Ò�½ 5/-G8H-JÌ28/6

Q��l�.P�+�Y�� N%�T���.���r��m(NX+���+,pHz=+���U�+�+��¥h���)ThX� ���@*O���+}�����{U��MY����� a�+�*O���+W��Q0.639

i

Page 14: Basics of R – Exercises · Basics of R – Exercises Read the instructions closely! Lines starting with ”>” contain R codes, and they should be written without the ”>” sign

� ��� 0�8:9<;=9>2@?�0�ACB�B.D 9(F�G5;�9>I|IJ9>3 6@9>249>I

��+O����+5NX� �M��Nb���r[��������]\=+���+��)�.�`��YMYM 1+SR�a���+�Q�Q�+��1NX+���+�Q�z_ ���a�a�YM )�M��N��:�Ïxy��+�Q.�nmo� ��+�� TP1NX+���+%i}w���� ����+�����,*r��hX+}�.P�+}+�Q.�.�M*r���.��� ����m�Q.�T�����������1+��.��� �@��+�+���+��d�M�¥�.P�+W�Ïxy��+�Q��Cm�� �.*}��Y`�r*,� ��+W��+�YM�`��z�Y�+%�|Y�+���ØÙQ���a�a�YM ���� X+�Q.�`���¥- ���.�`�����+WQ�P��.�M��h ��NX+%�

ª ��e ° µ ³���g ­�³ ´ ®���e ° ¯

��+���ØÙQÉQ�+�+��.P�+���� a p�$±���]\^+���+��_�.�`��YMYM +SRa���+�QTQ�+��uNX+���+�Q1QT� �.��+�� � ��� �����M��NÓ����aJ���T��*,+���+�� �! @x"���MY���ÍÄ »)³ � µ 0_�S�

ª±° »%º_·�g%«�c)³Ú®��Je ° Â Ä »)³ � µ 0(ÂÍÆ_ÃXf«�³%¬ µ Ì Ò Â�g%¶��%à ´�°)µ ¸��_¶_¬=¸X¯��c)»�Ä 7 Î » Ë +)»)c ÃXfÆ ¼�� Á�g�f�³,+)c)» Æ�³�¼�� � ¹ °Ç½�� g_c%Ã�³ �

0�Ì:3'8 5 � 6 Ä��_Á)¹ ¹_º�»%¹J¼ÚÂ.c�e�ºJe�¶F�X¼�f�² Ì Ò 6H6'.H0 Ò �2- ½ 0uÌ20 �10 3 -)³��Ì_Ì Ì4. ½ -. 3 Ò 0 6 Ì4.ZÄ��_Á)¹ ') ·ÇÂ��|e�² Å c ­ ´ e�f=e c)g ¬ ° » ¹ 3'-G8H-G./-C�2- ½£Ò Ì20 �Ì:- ._³�� Ò 6 Ì_Ì ½£Ò./-'.H5 Ì:3 8 ̱Ä��_Á)¹*+%¹_·� G+����%×������� X·����×X· Î ¹_Á Ì:-'. Ò 0/-G0A�Ì ½ � Ì4-A�Ì40 6)³�� Ò 6 Ì Òǽ 53|Ì4-G8 Ì)Ì � 0ZÄ��_Á)¹ ') ·ÇÂ�Ý�³_g 7 c ­ ´ e�f=e c)g ¬ ° »/+ -G6/3'-H6 Ò �Ì ½£Ò Ì4-A�Ì40 Ì�³�� Ò 5uÌ Òǽ ÌÌ:6G-G8 . 6 Ì:6 Ä��_Á)¹ ¹_º�»/+�¼)¼_¼ÚÂ.c�e�ºJe�¶F�X¼�f�² 3'�H-'5�Ì;3 � Ò�½ 8uÌ:3C�Ì Ò 0_³�� Ò . � ½ -Ì;3�8H5 3 Ì4. .ZÄ��_Á)¹ ³ ´�° 3'�/3�Ì:�H-C�Ì ½£Ò Ì20 �28 ._³�� Ò . 6 ½ 30H.H-H6 6 6 Ì:6 Ä��_Á)¹ ') · ´ Â��Je�² Å c%­ ´ e�f�e%c_g ¬ ° » 3'�H-'5�Ì;3 �Ì ½£Ò Ì:3 �28 Ì�³�� Ò 3 . ½ 83'8/3|Ì Ì4- � 5ZÄ��_Á)¹ ´ e�f=e c)g ¬ ° » ­�³_g ´�°�´�° ³ ¬�»_c 6G-H6|Ì:�H-C�Ì ½£Ò Ì4- �46 5_³�� Ò 3 . ½ 30/�|Ì40 � Ì Ì48ZÄ��_Á)¹ .�� ´ e�f=e c_g%¬ ° » ¼ Î�� )H.H. Ò Ì -|Ì:6'5H5/6C� Ò�½ 5uÌ20 �25 0_³�� Ò 0 0 ½ -8 3'6 - � 0ZÄ��_Á)¹ ') ·ÇÂ�Ý�³_g 7 c ­ ´ e�f=e c)g ¬ ° » À -'./-'0H8H0A� Ò�½ 5uÌ)Ì �2. 0_³�� Ò Ì Òǽ 5

��+���+4a�xy-%��YM��+�Q�UV+���+O�� �.��+����+��É����+5���b*}��YM�.�Ma�Y�+l��+�Q.�.�M��N{z) d�k+����.��*O�M���yØÙQ�� ���)TP_z=+��.N�ØÙQ�Ô<L��Ê�oÔ|��Y�Q�+L���Q����-X+��. "������+~�V*,+��.P��)�^ijkP�+�z��MN%NX+�Q.�C���]\^+���+����+�z=+���U�+�+��¥NX+���+�h���)ThX� ���@�����"U��MY��"�£ )a=+H*O���+H��Q��M�"�.P�+W+SR�a���+�Q�Q.��� ����m<t@a=�%tnw� [���Q.�C�M�"�.P�+WYM��Q.� �Si>jkP���QC*r��hX+�QCQT+���Q�+�Q.�M���+��M��Uk� QC�.P�+�h���)ThX� ���CNX+���+%i

� � � ;=0Ï?|0�256 6C9>259 B�0�I�?�I 0�2n?"!$#lB�9

$Ç� ������dQT��-X+}�.P�+5� ���.a�������m ° » º)·�g%«�c_³b� Qn��+SR�@[�Y�+%i@w£�¥�� Q�+4 X� �¥Uk���_�H�M��mo� �.*r���.��� ����z=� ������YMY>!�%%&'&NX+���+�Q@QT� �.��+��{z_ "�� �.��+����+���a�xy-%��YM��+�Q����� a=+%�

ª±° »%º µ�° » º_·�g «�c_³Ú® ��e °  Ä%»_³/� µ 0<£Æ)Ã%f«�³ ¬ µ 5/-G�/3�Â.g%¶��%à ´�°)µ ¸��)¶X¬=¸= ´ » ¬ °�½ «)­ µ ¸ ¸X¯ª�Ë ¬|e ° ³ ½y° g «�c)³Ú® ° » ºÇÂ�¸;��e c)³ Æ�g�f�³ ½y°)¾_° ¸=£¬�» Ë�½ Æ�g�f�³ ´ µ_À  ´ ³%º µ ¸�( ° ¸%¯

)�FW9<;+*�0�I|9

��� *OaJ����+W X� ���@��+�Q.��YM��QCU��M�.P��.P�+W��+�Q.��YM��QC�M���.P�+4���.�.���Y�+:�y�k��YMY���U@#%$%$%#�i a=�mS�Si�w����.PJ���@aJ��a�+��C�.P�+� "Q.P��~U��Y�Q��,�<j�x�'V���u��+�Q.��YM��Q~iÇ����U �.P�+� �����+}��+�Y`����+������O�.P�+�Y�� N%�T���.���%Qn X� �¥PJ��-X+}� z��T���M��+���m ��� * *O������X���.�T�� �����T��,

Page 15: Basics of R – Exercises · Basics of R – Exercises Read the instructions closely! Lines starting with ”>” contain R codes, and they should be written without the ”>” sign

WORKING WITH LIMMA AND AFFYMETRIX DATA

PRELIMINARY OPERATIONS

A. Start RGui;

B. Change the working directory choosing the folder where the Affymetrix data is located;

C. Load the workspace you have saved from the AFFY exercise: Load(“Affy_Preprocessing.Rdata”)

C. Load the needed libraries: library(affy) library(hgu133aEG1000) library(limma)

In this exercise we will perform the following tasks:

1. building the design matrix and defining the contrasts of interest;

2. fitting the lnear model;

3. exploring and writing the results;

1. BUILDING THE DESIGN MATRIX AND DEFINING THE CONTRASTS OF INTEREST

design <- model.matrix(~0 + pd[,1])

colnames(design) <- c("NORMAL", "RCC")

contr <- makeContrasts(NORMAL-RCC, levels=design)

2. FITTING THE LINEAR MODEL

fit <- lmFit(datrma, design)

fit2 <- contrasts.fit(fit, contr)

fit2 <- eBayes(fit2)

3. EXPLORING AND WRITING THE RESULTS

First, we visualize the number of significant genes by Venn diagrams. Here we use p-value < 0.01 and

Benjamini-Hochberg p-value correction. vennDiagram(decideTests(fit2, p.value = 0.01, adjust.method = "BH"))

we extract the gene symbol and the entrez gene information from the annotation package gs <- as.data.frame(unlist(as.list(hgu133aEG1000SYMBOL))) eg <- as.data.frame(unlist(as.list(hgu133aEG1000ENTREZID))) annot <- cbind(rownames(gs), gs, eg) colnames(annot) <- c(“ID”, “Gene Symbol”, “Entrez Gene ID”)

Page 16: Basics of R – Exercises · Basics of R – Exercises Read the instructions closely! Lines starting with ”>” contain R codes, and they should be written without the ”>” sign

we store the significant genes with their annotation in a data-frame object results <- topTable(fit2, coef=1, n = 1838, genelist=annot)

Finally, we save the table of significant genes into a TAB-delimited text file (.txt) write.table(results, "results.txt", sep="\t")

FINAL OPERATIONS

A. Save the workspace save.image("Affy_Limma.Rdata")

B. Save the history savehistory("Affy_Limma.Rhistory")

Dario Greco

Institute of Biotechnology - University of Helsinki

Building Cultivator II, room 223b

P.O.Box 56 Viikinkaari 4

FIN-00014 Finland

Office: +358 9 191 58951

Fax: +358 9 191 58952

Mobile: +358 44 023 5780

Email: [email protected]

Page 17: Basics of R – Exercises · Basics of R – Exercises Read the instructions closely! Lines starting with ”>” contain R codes, and they should be written without the ”>” sign

FINDING OVER-REPRESENTED GO FAMILIES

PRELIMINARY OPERATIONS

A. Start RGui;

B. Change the working directory choosing the folder where the Affymetrix data is located;

C. Load the workspace you have saved from the AFFY-LIMMA exercise: Load(“Affy_Limma.Rdata”)

C. Load the needed libraries: library(GOstats) library(limma) library(affy) library(hgu133aEG1000)

In this exercise we will perform the following tasks:

1. creating the parameters for the hypergeometric test;

2. performing the hypergeometric test;

3. exporting the results.

1. CREATING THE PARAMETERS FOR THE HYPERGEOMETRIC TEST

First, we need to find the column of the limma results data frame containing the Entrez Gene Ids (It

should be the column 3): colnames(results)

Now, we can create the parameters for running the Fisher's Exact Test: params <- new("GOHyperGParams", geneIds = as.vector(results[,3]), annotation = "hgu133aEG1000", ontology = "BP", pvalueCutoff = 0.05, conditional = FALSE, testDirection = "over")

In this command, we specify the Entrez Gene Ids, the annotation package, the ontology that we want to

assay (BP, MF, or CC), the p-value cut off (here we chose 0.05), whether we want to run a conditional

test, and the test direction, for finding the over- or the under-represented families (here we want to find

the over-represented families).

2. PERFORMING THE HYPERGEOMETRIC TEST

BPover <- hyperGTest(params)

3. EXPORTING THE RESULTS

In order to save the results into a data.frame object: BPresults <- summary(BPover)

Now we can export the results into a TAB delimited text file: write.table(BPresults, "BP_over.txt", sep="\t")

FINAL OPERATIONS

A. Save the workspace save.image("Affy_GOstats.Rdata")

Page 18: Basics of R – Exercises · Basics of R – Exercises Read the instructions closely! Lines starting with ”>” contain R codes, and they should be written without the ”>” sign

B. Save the history savehistory("Affy_GOstats.Rhistory")

Dario Greco

Institute of Biotechnology - University of Helsinki

Building Cultivator II, room 223b

P.O.Box 56 Viikinkaari 4

FIN-00014 Finland

Office: +358 9 191 58951

Fax: +358 9 191 58952

Mobile: +358 44 023 5780

Email: [email protected]

Page 19: Basics of R – Exercises · Basics of R – Exercises Read the instructions closely! Lines starting with ”>” contain R codes, and they should be written without the ”>” sign

Clustering – Exercises This exercise introduces some clustering methods available in R and Bioconductor.This exercise uses the prenormalized yest dataset. 1. Reading the prenormalized data Read in the prenormalized Spellman’s yeast dataset: > d<-read.table("combined.txt", sep="\t", header=T, row.names=1) We want only the cdc15 data, so take only those columns from the data: > names(d) > da<-data.frame(d[26:49]) Remove missing values from the data: > dat<-na.omit(da) 2. Filter the genelist by standard deviation Select only the genes that are among the 0.3% of the highest standard deviations. > library(genefilter) > # Row-wise SDs > sds<-rowSds(dat) > # Which is the value at 99.7% of data > sdt<-quantile(sds, 0.997) > sel<-(sds>sdt) > set<-dat[sel, ] How many genes are left after filtering? 3. Creating a heatmap using Euclidean distance and complete linkage > heatmap(as.matrix(set)) To get other colors in the heatmap, you first need to generate a sequence of colors, and then plot the heatmap using these colors: > library(RColorBrewer) > heatcol<-colorRampPalette(c("Red", "Green"))(32) > heatmap(as.matrix(set), col=heatcol)

Page 20: Basics of R – Exercises · Basics of R – Exercises Read the instructions closely! Lines starting with ”>” contain R codes, and they should be written without the ”>” sign

4. Saving the heatmap into a file For further modifications, the heatmap might need to be saved in a file. This is accomplished with: > cwd=getwd() > bmp(file.path(cwd, "heatmap.bmp"), width=1800, height=1800) > heatmap(as.matrix(set), col=heatcol) > dev.off() This results into about 6*6 inch print quality bitmap image in your data folder. Some papers might want to get a postscript image, and this is accomplished as: > cwd=getwd() > postscript(file.path(cwd, "heatmap.ps"), width=1800, height=1800) > heatmap(as.matrix(set), col=heatcol) > dev.off() 5. K-means clustering of genes In K-means clustering you need to pick an artificial number, the number of clusters (K). To produce a K-means clustering with 5 clusters, type: > k<-c(5) > km<-kmeans(set, k, iter.max=1000) Calculate an average withinness of the results. This is a measure of how close together genes lie inside the clusters. > mean(km$withinss) [1] 21.1838

Run the same K-means analysis several times (save the result into a new object every time). Select the K-means clustering giving the smallest withinness score as the best result. You can do this by hand, or run the following code chunk: > ss<-c(1000000) > for(i in 1:10) { > km<-kmeans(set, 5) > if(mean(km$withinss)<=ss) { > ss<-mean(km$withinss) > km.best<-km > } > }

Page 21: Basics of R – Exercises · Basics of R – Exercises Read the instructions closely! Lines starting with ”>” contain R codes, and they should be written without the ”>” sign

6. Visualizing the K-means clustering Let’s produce a new K-means clustering result using four clusters: > km<-kmeans(set, 4, iter.max=1000) Next, initiate a 2*2 image area, and draw the expression profiles. We need to apply a for-loop here: > par(mfrow=c(2,2)) > for(i in 1:4) { > matplot(t(set[km$cluster==i,]), type="l", main=paste(“cluster:”, i), ylab=”log expression”, xlab=”time”) > }