Using Excel
Biostatistics 212
Lecture 4
Housekeeping
• Questions about Lab 3?– replace vs. recode
• Final Project Dataset!– “Housekeeping” commands vs. data cleaning
(don’t show data cleaning)
• A little short-handed today…
Today...
• Why are we talking about spreadsheets?• Pro’s and Con’s of using a spreadsheet for:
– Data management, Statistics, Calculating, Modeling, Tables, Figures
• Cells• Formulas• Cutting and pasting formulas• Spreadsheet style• Examples
Why spreadsheets?
• Excel is widely used, and for good reason– Store numbers and text– Calculations– Desktop graphics – Tables and Figures– Flexible creation of ledgers, models, other
complex programs
Why spreadsheets?
• How is a spreadsheet different than Stata’s data editor?– Less structured– Formulas– Formatting
Why spreadsheets?
• How is a spreadsheet different than a database program like Access?– Less structured– Formula chains– Formatting
Pro’s and Con’s of spreadsheets
• For data management– Pro’s
• Easy start – just name columns and start typing
– Con’s• No structure
• Can’t sort, filter or query data
• “Flat” file – no relational table structure allowed
Pro’s and Con’s of spreadsheets
• For statistical analysis– Pro’s
• Easy start, if you know how to do formulas
– Con’s• Extremely limited range of options
• Difficult to document
Pro’s and Con’s of spreadsheets
• For calculating, or “modeling”– Pro’s
• Repetitive calculations easy• Complex calculations easy
– Con’s• Simple, 1-time calculations not as fast as a
calculator• Sometimes hard to decipher in retrospect
Pro’s and Con’s of spreadsheets
• Tables and Figures – will discuss in Sessions 6 and 7
Cells
• The basic building block of a spreadsheet
• Can contain:– Numbers
– Text
– Dates, times, other special formats
– “blanks”• Start with 46 million blank cells!
(230 cols x 66536 rows x 3 worksheets)
Cells, cont
• Enter anything you like into each cell (numbers, text, symbols, etc) using keyboard
• Contents displayed on spreadsheet
• Organized and named by column/row
Formulas
• Use when you want the contents of one cell to depend on the contents of other cells
•ALWAYS starts with: =
(an “equals sign”)
Formulas
• Can contain:– Numbers– Text– References to cells– The usual math operators (+ - * / ^ )– Built-in functions
Formulas
• Cell contents update automatically when a referenced cell content changes
• “Chains” of formulas make for flexible calculating
Formulas
• Contents of a cell displayed on spreadsheet
• The formula determining that content is displayed in the “formula box”
• Example
Formulas
• Types of formulas– Arithmetic
• +, -, *, /, ^
– Logic• IF(boolean, value 1, value 2)
– Returns value 1 if TRUE, value2 if FALSE
• AND(boolean, boolean, boolean…)– Returns TRUE if all booleans are true, otherwise FALSE
• OR(boolean, boolean, boolean…)– Returns TRUE if any booleans are true, otherwise FALSE
Formulas
• Types of formulas, cont– Functions, for example:
• SUM(range of cells)– Returns the sum of the values in the range
– SUM(A5:A10)
• AVERAGE(range of cells)– Returns the average of the values in the range
• STDEV(range of cells)– Returns the standard deviation
• NORMINV(probability, mean of dist, SD of dist)– Returns the z-value associated with a given probability…
Formulas
• Types of formulas, cont– Functions, for example:
• LN(number)– Returns the natural log of a number
• ABS(number)– Returns the absolute value of a number
• LEFT(text, number of characters=x)– Returns x number of characters from the text in the cell, starting
at the left side…
• NOW()– Returns the current date, time
Formulas
• Tips– Use parentheses
• IF(SUM(A5:A10)>5,1,IF(C9=“y”,2,3))
– Or do in multiple steps
Cutting/Copying and Pasting
• Cutting and Copying treat formulas differently!
Cutting and pasting formulas
• Excel assumes the cell references are ABSOLUTE, and you’re just moving the location of the formula cell
• Example
Copying and pasting formulas
• Excel assumes the cell references are RELATIVE
• Example
• Shortcut: drag little square in the corner…
Copying and pasting formulas
• If you want to FIX the position of a referenced cell, use $’s= A5 + $B$6
• Example
Examples
• Repetitive calculations– Back-transforming linear regression coefficients
• Complex calculations– 2 x 2 template
• Modeling– Mortgage calculator– Risk integrator– Figure 2 for LDL-lowering paper
Spreadsheet style
• Formatting– Text– Column width– Borders– Placement of stuff on the page
Spreadsheet style
• For models:– Inputs on the left, in red– Outputs on the right, in blue, boxed, bolded, etc– Calculations on other sheets– “Protect” all cells besides inputs
• Format/Cells…/Protection
• Tools/Protect
Take home points
• Understand cells and formulas
• Use copy/paste with and without fixed cells ($A$45)
• Good formatting adds significant value to your spreadsheet
Lab 4
• Practice with:– A repetitive calculation spreadsheet– A complex calculation spreadsheet– Introduction to making a figure with Excel
• Due before lecture next week
• Extra credit puzzle challenge – 2x2 excel template– Due Sept 18th – email to [email protected]
To come…
• Next lecture– Epidemiologic analysis with Stata
• 2 x 2 tables, confounding and interaction
• Epitab commands
• Logistic regression introduction