writing a report (and doing science) - university of torontobutler/c32/notes/report.pdf · writing...

37
Writing a report (and doing science)

Upload: dodang

Post on 16-Apr-2018

219 views

Category:

Documents


1 download

TRANSCRIPT

Writing a report(and doing science)

Writing a report

I The first part of doing research is doing research,

I The second part of doing research is communicating your findings tothe world.

I That means writing a report or paper to describe what you did.

I Research should be reproducible: anyone else should be able to re-dowhat you did and verify that it is correct.

I So your report should contain all the relevant detail for someone toreplicate (or not) your findings.

I Need to support your conclusions so that your readers will believe you.

I This is how science (or human knowledge generally) works.

2 / 37

FalsifiabilityI To say something meaningful about how

the world works, you need a statement that could be proved wrong, like

No alien spaceships have ever landed in Roswell, NM

I How to prove this wrong? Just find an alien spaceship there.

I Some unfalsifiable statements:

I An alien spaceship crashed in Roswell, NM.

I A giant white gorilla lives in the Himalayas.

I Loch Ness contains a giant reptile.

I If you disagree, you might be met with “the government hid theevidence”, “the snow covered the gorilla’s tracks”, “the monster ishiding in the mud at the bottom of Loch Ness”.

I All you ever find is an absence of evidence. And you can neverdisprove these. But they are not science.

3 / 37

Falsifiability part 2

I Science consists of a vast array of falsifiable statements that havenever been falsified, despite everyone’s best efforts. This is what itmeans for something in science to be “true”.

I “Life arose by evolution” also falsifiable, but never has been falsified.

I Evolution provides the best explanation we know of how life arose.

I All the many things in it that could be falsified have not been.

4 / 37

Clinical trials

I How do we know that drugs work?

I Randomized controlled trials.

I Have to show that new drug works significantly better than currentdrug (or placebo).

I New drug could be worthless (“falsifiable”), but drug trials show it’sactually effective.

I Drug trials typically use huge numbers of patients, so drug has towork many times. It might not, but usually it does.

I New drug, or new piece of science, has to prove its worth before beingaccepted.

5 / 37

A typical journal article

http://jap.physiology.org/content/100/3/839

6 / 37

Writing a report

I Tell the story of what you did.

I Typical structure:

Introduction “setting the scene”Literature review “what others have done”Methods “what you did”Results “what happened”Conclusions “what the results mean to you”Discussion “what the conclusions mean to the world”

I Details of structure vary by discipline/requirements.

I Journal articles often begin with Abstract (summary of what article isabout).

I Writing for your readers: make it as easy as possible to read andunderstand.

7 / 37

The Introduction

I What is your study all about?

I Why are you doing it?

I What are you expecting (hoping) to see?

I What research hypotheses do you have?

I “As concise as possible without detracting from clarity”.

8 / 37

Literature Review

I What do other people say?

I Include references, eg. “Dribblington and Smith (1992) demonstratethat blowfish, in fact, suck.” (I made this up.)

I Summarize in a few words, as above, what each cited work says (inthe context of what your report is about).

I Include each cited work in your list of References (typically at the endof the report).

I Sometimes combined with the Introduction.

9 / 37

Methods

I What did you do?

I Sometimes has more descriptive title.

I What experiment did you conduct?

I How did you design it?

I What results did you collect?

I How did you collect them (eg. technology)?

I Not the place for justifying what you did (that belongs inIntroduction).

I Matter-of-fact, clear.

10 / 37

Results

I For a simple analysis: one block of results, report and move on.

I More complicated analysis: might need to split up into severalsections, like this (only at greater length):

I First I did this and got these results,

I so I did that and got those results,

I and that suggested the other which produced some other results.

I Take the time to tell your story clearly.

I You might need a mini-Methods section before each Results piece. Ifthat makes sense, write it that way.

11 / 37

Purely statistical analysesI Such an analysis is mainly Results.

I “Methods” is just “where I got the data from”, which can be movedinto Introduction (no Methods).

I Analysis can be long and multi-faceted, so can have separate sectionsfor the various parts, eg.

I Exploratory analysis (histograms, tables of means, scatterplots).

I First-stage analysis (eg. t-tests, regression analysis)

I Second-stage analysis (eg. variations on the above suggested by theresults, eg. sign test, transformations in regression).

I The Conclusions (see below) is where you talk about why you didwhat you did. Results is for showing what analysis you did, and whatnumbers came out the other end.

I Use graphs to illustrate results where you can, rather than large tablesof numbers.

I Point: display results so that Conclusions (coming up) clearest.12 / 37

“How many significant digits should I quote?”

Only quote figures that you are pretty sure of — perhaps plusone more.

I Know how accurate your data values are (generally, accuracy given indata set).

I Derived quantities, like mean, SD might deserve one extra decimal.

I Regression slopes trickier:

I How many significant digits in y?

I How many significant digits in x?

I Take smaller of those. Perhaps plus one more.

I Just because SAS gives you six decimals doesn’t mean you shouldgive six decimals!

13 / 37

Conclusions: “what do my results mean to me?”

I Get to sell your results and their consequences.

I Definitely do want to add your opinions here!

I Go back to your Introduction for guidance:

I did you find out what you were hoping to?

I how do your results lead to that conclusion?

I if you didn’t find out what you were hoping to, why not?

14 / 37

Discussion: “what do my results mean to the world?”

I Place results in broader context.

I Why are these results important? What implication do they have?

I What are the limitations of my study that should make me hesitantabout generalizing my results? (Eg. small sample sizes, somethingomitted in the study that could affect results.)

I How do your results stack up against the ones in the literature(compare the Literature Review)?

I Do your results suggest future lines of research? (They almost alwaysdo.)

15 / 37

Final bits

I Acknowledge any people or organizations that helped you.

I List of references (of the works you cited in the Literature Review andelsewhere).

I Done!

Journal articles often have Abstract, short summary of what article isabout.

I Sometimes Abstract has (headline) results.

I Doesn’t usually have references.

I Adhere to norms of journal.

16 / 37

The example reportI I know nothing about physiology!

I Abstract: really Introduction and Literature Review. Gives away some of theresults. (I think Abstract should stand on its own, telling you whether paperis worth reading.)

I Key Words: some journals like this. Can search for Key Words that interestyou.

I Here, citations are numbered. Online citations often link to References list.

I “What you did” here called Materials and Methods (common in sciences).

I Note technology detail.

I Results section has not just significant results, but non-significant ones aswell.

I Just enough detail to allow for replication. (Clashes with limited spaceavailable in journals.)

I Conclusions section called Discussion.

I Plain-English last sentence.

17 / 37

Reproducible Research and R

I Old way: submit your paper to journal, edit down, be prepared forrequests from people who wanted to know more.

I New way: put paper online, including all code and data.

I Anyone can then reproduce exactly what you did, and discuss theirfindings with you.

I Problem: how do you know your analysis and conclusions in sync?

I Usual procedure: copy code and output into your report. But mightcopy old version. How do you know you got the most recent version?

I Solution (in R): construct document with specially formatted code.When you process the document, that code gets run (and resultspasted in). So code you show guaranteed to produce results claimed.

18 / 37

R Markdown

I Markup language (like HTML) containing instructions about howdocument should appear.

I In R Studio, File, New, R Markdown. Might need toinstall.packages("knitr") first.

I Fill in title, author. Keep default format at HTML for now. Click OK.

I R Studio opens example document. See how it works. Change to suityourself.

I To process an R Markdown document, save. Find “knit HTML”. Clickit. After some processing, see output (HTML) in previewer window.

19 / 37

R Markdown part 2

I Some R Markdown “markup” elements:

I A row of ========== with blank line below makes title (sectionheader). ------------ makes subsection header. Or use #, ##, ###at start of line for section, subsection, subsubsection headers.

I **text** produces text; *text* makes text.

I (most important) The line ```{r} starts an R code chunk. All thelines until closing ``` are interpreted as R code. R Studio colours codechunk grey so you can see it.

I To include a plot, simply include the plot command(s) among your Rcode.

20 / 37

Part of the output

21 / 37

Some tips

I To add a new code chunk, look for Chunks button and select Insert Chunk.Or, Control-Alt-I.

I Play with the template document! Make a histogram, do a regression, etc.,etc.

I You can also run the code chunks in R Studio, to check that they areworking properly. These are the other options in Chunks.

I If you see an error in the HTML, or something you want to change, go backto the R Markdown, change it there, and then Knit HTML again.The R Markdown document is your “base”; the HTML is produced from it.

I The HTML document can always be produced again, so it doesn’tmatter if you lose it. The R Markdown document, on the other hand,needs to be kept safe.

I You can save the HTML document and view it with a web browser.

I Click the MD button (left of Knit HTML) for more things you can do withR Markdown.

22 / 37

Other kinds of document

I Click arrow next to Knit HTML for other choices:

I Knit Word: produces a Word document (if you have Word or eg.LibreOffice installed)

I Knit PDF: produces PDF document (if you have LATEX installed).

Your choice is “sticky”: if you knit again, you’ll get same documenttype.

I Reminder: use R Markdown as “base”: make any changes in RMarkdown, and produce document as final step.

I With SAS too: do R stuff first, produce Word doc; paste in SAS stufffrom SAS Studio.

I Or, with SAS (better) use Statrep, which requires LATEX.

23 / 37

Making presentation using R Studio and R Markdown

I File, New File, R Presentation (near bottom)

I A sample presentation pops up. This is R Markdown code, edit it thesame way as any other.

I You can copy-paste in any other R Markdown code.

I The =========== “headings” make new slides with the stuff abovethe line as the slide title.

I Click on Preview to see the presentation (top right).

24 / 37

Fixing problems

I In the likely event that you don’t like what you see, go back to .presfile and change it.

I To start a new page in a presentation, put in a heading with ======.This makes the title of a new slide.

I Click the More button (top right, above the preview) for moreoptions. View in Browser displays the presentation in your defaultweb browser (mine is Chrome).

I In a web browser, or the previewer, get to next page using space bar,clicking the arrow, or using the arrow keys. Previous page:shift-space, click arrow, arrow keys.

25 / 37

More Markdown tips

I Bullet points like this:

* Here is the first point

* and this is the second

* Here, finally, is the third point.

Or start the lines with -. Use a blank line to end.

I Inserting images from current folder or Web:

![caption text](image.png)

![text](http://example.com/logo.png)

Latter needs you to be online while processing, though not (I think)while giving presentation.

I Can give graphs captions by changing code chunk line to this:

```{r, fig.cap="Scatterplot of x against y"}

26 / 37

Even more Markdown tips

I Using number from R output in text:

We had `r length(x)` simulated values altogether.

I Get “typewriter font” like this:

This presentation was made using `R`.

I Code to display but not run:

```

for (i in 1:10)

{

print(i)

}

```

Insert code chunk and take out{r}.

I Code to run but not display(show output only):

```{r,echo=F}

rnorm(100,50,10)

```

27 / 37

Even even moreI Presentation, or Knit to PDF: can use LATEX formulas, like

$$x = { -b \pm \sqrt{b^2-4ac} \over 2a }$$

which comes out as

x =−b ±

√b2 − 4ac

2a

I I encourage you to learn LATEX! Much better than Word formathematical, scientific (and for that matter academic) documents.

I A more sophisticated R/LATEX connection via Sweave: can embed Rcode chunks in LATEX document. In R Studio, File-New-R Sweave.This produces a skeleton LATEX document. Insert code chunk in usualway (looks different).

I Include SAS in LATEX using statrep. More over.

28 / 37

Statrep

I SAS’s version of Sweave: include SAS code in LATEX document, runwhen document produced.

I Uses SAS Studio run on virtual machine on your computer (not ononline SAS Studio unless you want extra difficulty).

I Setup:

I SAS Studio running on virtual machine on your computer, orfull-version SAS on your computer.

I Folder on your computer linked to virtual machine (the folder you referto as /folders/myfolders on virtual machine), if usingvirtual-machine SAS.

I Construct document on your computer in that linked folder.

I Link: http://support.sas.com/rnd/app/papers/statrep.html.

29 / 37

Workflow (1/3)

I Get template LATEX document with right headers (over) put up.

I LATEX as usual. Code chunks and output separated (since SAS outputusually rather long).

I Insert data step (reading in data) by enclosing SAS code in\begin{Datastep} and \end{Datastep}.

I Insert proc step (running a procedure to do analysis) by enclosingSAS code in Sascode environment like this. Choose a label name forstore= by which to grab output (use any label name you like):

\begin{Sascode}[store=labelx]

proc means;

\end{Sascode}

30 / 37

Extra LATEX headers

Put these above your \begin{document}:

\usepackage{statrep}

\usepackage{parskip,xspace}

\newcommand*{\Statrep}{\mbox{\textsf{StatRep}}\xspace}

\newcommand*{\Code}[1]{\texttt{\textbf{#1}}}

\newcommand*{\cs}[1]{\texttt{\textbf{\textbackslash#1}}}

\def\SRrootdir{/folders/myfolders}

\def\SRmacropath{/folders/myfolders/statrep_macros.sas}

You may not need all of these, but try them first. The last two tell Statrepwhere to put/find SAS code generated.

31 / 37

Workflow (2/3)

I Output of two kinds: text output (tables of numbers) and graphicsoutput (graphs/plots).

I Specify which output you want by using the label you created thus,plus create an extra label as shown:

\Listing[store=labelx]{labelxx}

\Graphics[store=labelx]{labelxy}

I The store= is the same label as in the Sascode (how Statrep knowswhich output to grab). Label at the end must be distinct (names offiles created in background).

32 / 37

Workflow (3/3)

I To produce .pdf, run LATEX twice, with a run of SAS in between:

I First time: if you look at .pdf, you see placeholders for SAS output(when it is produced). Also a file ending in SR.sas, which is SAScode to produce output.

I Go to SAS Studio (on virtual machine); see this file in list of files.Open this file.

I “Submit” it. This won’t produce any visible output, but its outputwill be saved in files. If there are errors, fix them in the LATEXdocument and run LATEX again.

I Second time: run LATEX again. In the .pdf, the SAS outputplaceholders are replaced by actual output. Done. (If you don’t likeanything, change the LATEX and repeat.)

33 / 37

Grabbing only some of the output

I SAS output tends to be long. It is divided into sections (eachcontaining one table or graph). Each one has a name.

I To find out what the names are, after you have run the SR.sas filein SAS Studio, look in the Log tab. Look for the code whose outputyou want only part of, like this:

34 / 37

. . . Continued

I Scroll down in Log tab until you see something like this:

I Note the store= at the top, and the table of “objects” at the bottom.

I In the “objects” table, each object you can select has a 3-part name(left column). You need only last part.

35 / 37

Grabbing what you want

I The “type” column says whether it’s a table of numbers (useListing) or a graph (use Graphic).

I Note that I got this via \begin{Sascode}[store=id].

I To get just the confidence intervals, do this:

\Listing[store=id,objects=conflimits]{idd}

I To get just the normal quantile plot, this:

\Graphic[store=id,objects=qqplot]{ide}

I To get more than one object (of same type), separate by space:

\Listing[store=id,objects=statistics ttests]{idf}

which gets both summary statistics and t-tests, but nothing else.

36 / 37

Processing R and SAS in same document

I You might have a document that contains both R and SAS code tobe run (and output inserted). How to handle both?

I Process (most easily with Makefile):

I Create Sweave document with name like test.Rnw (with .Rnw

extension). Add R code chunks with <<>>= format and SAS codechunks in Statrep style.

I Use knitr to process the R. This makes file test.tex with R coderun and output inserted.

I Run LATEX once to produce test SR.sas.

I Run test SR.sas in SAS Studio (on virtual machine).

I Run LATEX again to grab SAS output.

I Check final test.pdf: should have both R and SAS output.

I Make any changes to test.Rnw and repeat whole process.

37 / 37