vaccination data analyses with r and r commander lars t. fadnes & halvor sommerfelt centre for...
TRANSCRIPT
Vaccination data analyses with R and R commander
Lars T. Fadnes & Halvor SommerfeltCentre for International Health
University of Bergen
Aim for this session
• Introduce a brilliant tool
• Give examples on use of the tool
• Guide to further knowledge
What is R?
• R is a free software environment that includes a set of base packages for graphics, math, and statistics.
• You can make use of specialized packages contributed by R users or write your own new functions.
Why R?
• Very powerful
• Developing extremely quickly
• Working on different platforms (not only Microsoft Windows…)
• Free of all costs
Why don’t all use R?
What is R commander?
Why R commander?
• Powerful
• Free of all costs
• Working on different platforms (not only Microsoft Windows…)
• Easy to learn and to use…
How to install R?For Windows:• http://cran.r-project.org/bin/windows/base/
– Easy• If here at UiB, the information department will fix it for you if you just
ask them to add it for you
For Linux (Ubuntu etc):- Very Easy
- Just search for R-base-core in Synaptic Package Manager and add it
- http://socserv.mcmaster.ca/jfox/Misc/Rcmdr/installation-notes.html
How to install R commander?
- install.packages("Rcmdr", dependencies=TRUE)
Some things to note first
• R is case-sensitive – help, Help, HELP and HELF are different…– Recommendation:
• Choose one style and stick to it
• Avoid using shortcut keys in R commander!
• If it’s something you don’t know?– There are lot’s of good information on the web
• Particularly for R
How to start?
• Open R– Load packages
• Rcmdr
or write
• library(Rcmdr)
Menu• File Menu:
– items for loading and saving script files; – for saving output and the R workspace; – and for exiting
• Edit Menu: – items (Cut, Copy, Paste, etc.) for editing the contents of the script and output
windows. – Right clicking in the script or output window also brings up an edit “context” menu.
• Data – Submenus containing menu items for reading and manipulating data.
• Statistics – Submenus containing menu items for a variety of basic statistical analyses.
Menu• Graphs
– Menu items for creating simple statistical graphs.
• Models – Menu items and submenus for obtaining numerical summaries, confidence intervals,
hypothesis tests, diagnostics, and graphs for a statistical model, and for adding diagnostic quantities, such as residuals, to the data set.
• Distributions – Probabilities, quantiles, and graphs of standard statistical distributions (to be used, for
example, as a substitute for statistical tables) and samples from these distributions.
• Tools – Menu items for loading R packages unrelated to the Rcmdr package (e.g., to access data
saved in another package), and for setting some options.• Help Menu
– items to obtain information about the R Commander (including this manual). As well, each R Commander dialog box has a Help button (see below).
• Script Window – R commands generated by the R
Commander – You can also type R commands directly
into the script window or the R Console– The main purpose of the R
Commander, however, is to avoid having to type commands.
• Output Window – Printed output
• Messages Window– Displays error messages, warnings, and
notes
• Graphics Device window – When you create graphs, these will
appear in a separate window outside of the main R Commander window.
Easily available function:
Let’s get started…
• Change directory– Find the folder where
you placed your data file
• Import data– Give it the name: Vaccine
• Save workspace as– Give a name to your file
Data Types
• Vectors – including continuous variables
• Factors – Nominal/ categorical
• Matrices, arrays and data frames
• Lists
http://www.statmethods.net/input/datatypes.html
Define the datatypes
Variables in dataset• IDNO
– ID number: categorical = factor
• Adjuvant– Categorical = factor:
• 0: Placebo 1: Adjuvant
• Day0– Continuous (vector)
• Day30– Continuous (vector)
– Variables are now defined as vectors
– Factors (categorical variables needs to be defined)
Now we’re ready to answer some scientific questions…
Vaccination response
• Effect of vaccination is often calculated by measuring antibodies before and after vaccination
– Response = concentration after vaccination concentration before vaccination
• Compute new variable: response – concentration before vaccination Day0– concentration after vaccination Day30
Does the new variable look reasonable?
• View data set
• Histogram
Summarize variable
• Calculate mean, median and standard deviation for– response
• for each group
• Numerical summaries
• Placebo: adjuvant = no
• Adjuvant: adjuvant = yes• Mean, standard deviation, the percentiles and
number of cases
Doing calculations for subset group
• Before making the ”Placebo” subset, remember to select main dataset (Vaccine)– You can now easily change between the datasets
•
Make a histogram of the response
• First for the adjuvant dataset
• Then for the adjuvant dataset
• And for the placebo dataset
– The histograms will be printed in the R window (not inside R commander)
– Right click on the graph and you can copy it as metafile to paste it into a document, print it or save it
Box plot
• Box plot for response– By AdjuvantCat– (for complete Vaccine dataset)
no yes
05
01
00
15
0
AdjuvantCat
resp
on
se
• Does it look normally distributed?
• We might need to do a logarithmic transformation of the variable – Compute new variable:
• logresponse
• Does it look more normally distributed (histogram)?
Check the logresponse
• Are the means and medians similar for both groups (placebo and adjuvant)?– Numerical summary (as earlier, but with logrespons)
• Check with an independent sample t-test– Adjuvant vs logresponse– Is the response different among those getting adjuvant?
• Will this be confirmed with a robust test not assuming normal distribution?
• Check with a – non-paramethric
» two-sample wilcoxon test (log rank test)» Use the ’Exact’ test
– Will the test be different when using response or logresponse» Why or why not?
Recoding and making new variables
Different types of examples on how to recode in the box:– Data manage variables
• value = ”factor”– Factor can be either number or word
• value, value, value = ”factor”– Listed with comma
• value:value = ”factor”– From lowest to highest values
• else all other values• NA missing
• How was the baseline immunological response (before vaccination)?– Variable: Day0
• Numerical summary
• Let us take a look into the upper quartile specifically– Make subsets:
• ”UpperQuartile”• ”OtherQuartiles”
Is baseline response associated with response after intervention?
• We can first repeat the t-tests for the subsets
• ”UpperQuartile”• ”OtherQuartiles”
– Independent sample t-test» logresponse» AdjuvantCat
• What is the results?– What are the differences between the means
• What does it mean?
Is intervention associated with response?
• Linear regression analysis with the full dataset– Statistics Fit model Linear model
• Dependant variable (left box):– logresponse
• Independent variable:– AdjuvantCat
Is baseline response associated with response after intervention?
• Linear regression analysis with the full dataset– Statistics Fit model Linear model
• Dependant variable (left box):– logresponse
• Independent variables (right box with ’+’ between each):
– AdjuvantCat– UpperQuartileDay0
Is baseline response associated with response after intervention?
• Linear regression analysis with the full dataset– Statistics Fit model Linear model
• Dependant variable (left box):– logresponse
• Independent variables (right box with ’+’ between each):
– AdjuvantCat– Day0 (continuous)
What will you conclude?
• Is adjuvant/placebo associated with response?
• Is initial immune response associated with response after vaccination?
How to save?
• Save R workspace as…– This will save your data (in the R format)
• Save output as…– This will save your output– Another strategy is to cut and paste what you want to save
• Always save the commands– essential if you want to re-run the analyses later– WordPad is a better option than Word etc (does not autocorrect -
change to upper case etc)
How to write and a command?
• Simply write the command in the script window, mark it and click ’Submit’ or press Ctrl+R
Nice to know:
• When writing comments in the syntax, start with the following sign #
• If you are uncertain about a function, use google or help(name-of-function)
Are there differenses in syntax between R and R commander?
• Some few:– Commands that extend over more than one line should have the second and
subsequent lines indented by one or more spaces or tabs; all lines of a multiline command must be submitted simultaneously for execution.
– Commands that include an assignment arrow (<-) will not generate printed output, even if such output would normally appear had the command been entered in the R Console [the command print(x <- 10), for example]. On the other hand, assignments made with the equals sign (=) produce printed output even when they normally would not (e.g., x = 10).
– Commands that produce normally invisible output will occasionally cause output to be printed in the output window. This behaviour can be modified by editing the entries of the log-exceptions.txt file in the R Commander’s etc directory.
– Blocks of commands enclosed by braces, i.e., {}, are not handled properly unless each command is terminated with a semicolon (;). This is poor R style, and implies that the script window is of limited use as a programming editor. For serious R programming, it would be preferable to use the script editor provided by the Windows version of R itself, or – even better – a programming editor.
True or false quiz
• R was the program that gave their shareholders most profit last year?
• R commander only works for Ms Windows?
• R has even more functions than R commander?
• R is built by a few people with a secret source-code?
• There are a lot of enthusiastic people working with R providing help to their peers in the R forum?
Further reading:
• The R Commander A Basic-Statistics Graphical User Interface to R - John Fox 2005.pdf– http://www.jstatsoft.org/v14/i09/paper
• Getting started with the R Commander: a basic-statistics graphical user interface to R– http://socserv.mcmaster.ca/jfox/Getting-Started-with-the-Rcmdr.pdf
• Quick-R: magnificent guide– http://www.statmethods.net/
• http://cran.r-project.org/manuals.html
• You have learnt some basic skills and can now experiment with the program yourself
Questions and comments