computing at stanford and introduction to sas hrp223 – topic 0 sept 26 th, 2011 copyright ©...
Post on 19-Dec-2015
221 views
TRANSCRIPT
Computing at Stanfordand Introduction to SAS
HRP223 – Topic 0Sept 26th, 2011
Copyright © 1999-2011 Leland Stanford Junior University. All rights reserved.Warning: This presentation is protected by copyright law and international treaties. Unauthorized reproduction of this presentation, or any portion of it, may result in severe civil and criminal penalties and will be prosecuted to maximum extent possible under the law.
Objectives
• Administrivia• Software tools at Stanford– Security at Stanford
• Software tools not endorsed by Stanford• Data• SAS
General
• The course website has critical details:www.stanford.edu/class/hrp223/
• If you can, please print the slides just before the start of class.
Administrivia
Goals
• This course will provide practical solutions to problems that arise before doing analyses as well as the final push toward getting the results.
• I will talk about issues like finding unruly data, massaging data into a useful format, building datasets of valid data and choosing statistics.
Administrivia
Getting Help
• Mike Hurley [email protected] is the TA for the course.• His office hours will be announced weekly. I will be available
for online Q&A at [email protected] or preferably, on the class newsgroup. I will answer questions every morning around dawn. If you post to the newsgroup and do not hear back quickly please email me.
• Things labeled “Assignment”, but not “Homework”, can be done with the help of classmates.
• You are strongly encouraged to discuss your problems up until you start writing your answers to the homework problems.
Administrivia
Preliminaries
• I assume you know how to use Windows or Mac OS.• For this class you need access to a machine with: – Windows XP Pro or Vista Business/Ultimate– Windows 7 Professional/Business/Ultimate.
• XP Home Edition or Vista Home Edition and Windows 7 Home Premium will not work with the software in this class.
• I use: XP Pro, 7 Pro, and XP Pro running in Parallels on the Mac.
Administrivia
Getting a Computer
• If you want to get a new computer, you can get one at a very good price through Stanford. You can get ideas on what is an acceptable computer here:
itservices.stanford.edu/service/helpdesk/recommended
• You want to have XP Pro or the Business or Ultimate version of Vista or Windows 7.
Administrivia
Free Stanford Tools
• You can get access to free software from Stanford by going here:https://itservices.stanford.edu/service/ess
• You must use antivirus software. • You will fail the course if you send me a
document that contains a virus or other malicious code. There is no forgiveness for this offense and this is not open to debate.
Stanford Software
Get the Sophos ScannerStanford Software
Virus and Worm Issues (3)
• Virus scan before you email me anything!
• Right click on the file you want to scan and then pick Scan with Sophos Anti-Virus
• Sophos keeps itself updated constantly.
Stanford Software
– Sophos Anti-Virus (For both Windows & Mac OS)• Watches for suspicious things and stops them until you
authorize the software
If your quarantine has a file get help
You can submit suspicious files
Stanford Software
Stanford Desktop Tools
• This allows you to install and update BigFix, Security Self-Help and Open AFS and other tools.– BigFix automatically checks for important
software updates.– Security Self-Help checks and allows you to fix
security weaknesses on your machine.– Open AFS lets you have access to your UNIX
account like it is just another Windows hard drive.
Stanford Software
Stanford Desktop ToolsStanford Software
Your UNIX Account
• You have a website made for you already:– www.stanford.edu/~YOUR_SUNET_ID
• UNIX stuffhttps://itservices.stanford.edu/service/afs
– You can use Stanford Desktop Tools to mount your UNIX drive just like another hard drive. I get stuff on the web quickly with Open AFS
– If you do not want AFS you can also use SecureFX which you can get from ESS or just go to afs.stanford.edu
– Do NOT put confidential/HIPAA sensitive stuff out there.
Stanford Software
My UNIX SpaceStanford Software
After AFS is InstalledStanford Software
SecureFXStanford Software
Stanford Software
afs.stanford.edu is the easy way to move files to your UNIX space.
Stanford Software
Passwords
• The Leland system places restrictions on passwords. You should set your passwords on other machines to be just as hard to crack.
https://itservices.stanford.edu/service/unixcomputing/unix/passwords • You can use Stanford’s Security Self-Help Tool which comes with
Stanford Desktop Tools to check your passwords.• If you do not know how to set or change your password look here:
www.stanford.edu/group/security/securecomputing/setpass.html
Security
General Security
• The biggest weaknesses in computer security are the legal users of the system. – Walking away from a terminal – Using passwords that are easy to crack – Taking data off of restricted machines– Viruses and Trojan horses will kill you if you let
them!
Security
• Email provides all the confidentiality of a postcard.
• If you are sending HIPAA sensitive information you can secure your email:
https://itservices.stanford.edu/service/secureemail
Security
Unsolicited Email
• Spam™, Spam™, Spam™, wonderful Spam™, yes wonderful Spam™
• You may get unsolicited commercial solicitations, advertisements, chain letters, or pornography through your Stanford email account.– NEVER respond to these messages, never use the REMOVE
provided in the email.– NEVER put your email address on a web page.
Security
• At webmail.stanford.edu you can choose the Preferences tab and Filters from the left to automatically sack repeat offenders.
Security
Back up your work!
• Each year, on average, one student in five loses all their work. Plan on your computer being destroyed at the worst possible time this year.– Coffee, computer worm or virus, small child with
refrigerator magnet, physical hard drive failure, theft, bicycle crash, etc.
• Every day back up your work to more than one location.
Security
Where to Backup
• PLEASE use removable media if you have no network access – – Floppy disk, CD, DVD, flash media
• NEVER backup or share confidential data (HIPPA sensitive protected health information) on mobile media without talking to security experts first.
• At home I use www.crashplan.com. Ask your Tech support person for recommendations.
Security
Encrypted USB drives
• USB drives (also called thumb drives) are a very convenient way to keep backups and allow you to move your data around.
• However, they are very easy to lose! NEVER store unencrypted, restricted data on a USB drive.
• You can encrypt at the file level (Excel, winZip) – ok• You can encrypt the whole drive (PGP disk, TrueCypt) –
Better.• You can have a hardware encrypted USB drive – BEST!
– There are many manufacturers, however, most are Windows only.– IronKey supports both Windows and Mac and is highly
recommended.• 1 Gig for $50 up to 32 Gig for $250 on Amazon
Security
Data Management and AnalysisTools of the Trade
• Containers to hold data– Microsoft Excel– REDCap
• Analysis tools– SAS with Enterprise Guide– R with Rcmdr
Other Software
Excel
• is not a good place for HIPAA sensitive (PHI) material
• makes it easy to enter bad data
• can be a huge headache to import
Other Software
REDCap• is a good place for HIPAA sensitive (PHI) material• makes it hard to enter bad data• is mostly painless to import for analysis
Other Software
SAS 9.2 TS2
SAS is an old programming language where you type commands and run a bunch of things at once.
Other Software
Enterprise Guide 4.305
EG is a newish programming environment where you type commands or point and click.
Other Software
R 2.13.1 http://cran.cnr.Berkeley.edu
R is a modern programming language with user hostile help files….
Other Software
R Studio http://www.rstudio.org/
Studio is an Integrated Development Environment (IDE) for R.
R Commander
Rcmdr is a friendly, but incomplete, graphical user interface (GUI) for R.
Other Software
Getting SAS
• If you have a machine with XP, Vista or Windows 7 Pro, Business or Ultimate and more than 30 Gig of extra hard drive space you can get SAS for $65 per year. Place the order here:https://itservices.stanford.edu/service/softwarelic/sas – There is a digital download that is HUGE (11+ Gig not Meg). If
you have a wired connection on campus consider it. Otherwise ask me for the DVDs.
• The instructions for installing it can be found here:www.stanford.edu/class/hrp223/2011/InstallingSAS93_20110904a.pptx
Other Software
SAS for Free on Campus
• If you don’t mind working in a public place, SAS is in the Lane library and M202 lounge.
med.stanford.edu/irt/classrooms/features/computer_labs.html
Other Software
Other Tools I Regularly Use
• File manipulation– UltraEdit– Ultracompare
• Info Management– FileLocator Pro– Google Sites
Other Software
UltraEdit
• If you work with text files, get UltraEdit and buy the perpetual license.
www.ultraedit.com
Other Software
UltraCompare
• A tool to track changes in code or other text fileswww.ultraedit.com/products/ultracompare.html
Other Software
FileLocator Pro
• If you can’t find files on your machine, consider FileLocator Pro.
www.mythicsoft.com/default.aspx
Other Software
Google Sites
• If you need to keep track of tons of random facts (like code snippets) consider using Google Sites.
https://sites.google.com/
Other Software
What is Data?
• Stuff that … – will make you famous or cry– you want to pull from the electronic medical
record– the information you will need to store if it is not in
the medical record
Data
Structured vs. Unstructured
• Unstructured data– Text like dictations, operation notes, data entry
comments– Difficult to process
• Structured data– Afford the ability to build ontologies– Dates– Pick lists (multiple choice)– Relatively easy to process
Data
Structuring Biomedical Data
• RxNORM for drug ingredients / brand names• ICD-9 for billing diagnostic and procedure codes– fairly coarse but nicely hierarchical
• ICD-O for detailed cancer pathology• CPT for procedures – No hierarchical structure, difficult to search
• SNOMED-CT – for general purpose clinical terms– Hierarchical, detailed and vast but with some gaps
Data
What is structured data?
• All pieces of information that you collect and calculate as part of a study are data. Every person’s response to a questionnaire is called a data point.
• There are two fundamentally different types of data: numeric and character. – Numeric data is always … numeric. Information that you could
want to do math on is numeric data.– Character data is alphanumeric. It includes the obvious things
like names and addresses, but it also includes numbers that you should not do math on.
• Some systems, like R, make finer distinctions and let you set data so they are forced to be factors.
Data
What is data coding?
• A question such as, “What is your current age in years?” is going to generate numeric data.
• A question such as, “At what age did you first contract a sexually transmitted disease?” is going to generate numeric data ….
But you are going to need to allow for the possibility that somebody has never contracted a sexually transmitted disease.
… and you always need to allow for people who never knew or do not remember information or who may be dishonest in their answers.
Data
What is data coding? (2)
• When you have a question that generates numeric data and your subject’s response is not a “real number” you can code a bogus value.– “Not applicable” can be coded as age –1000000.– “Do not know” can be coded as –2000000.
• The better way to deal with this problem is to use the value “NULL.”– SAS allows you to code 27 different types of NULL.– Null values make your job easier when you try to do math
on the values.
Data
Missing Data
• SAS represents missing character data as a pair of quotes with nothing between them and missing numbers are stored as a decimal place.
• You can also use .A, .B, etc. to code for missing numbers but you can’t enter them directly.
Data
What is data coding? (3)
• Questions that generate alphanumeric data are always complex compared to numeric data.
• “Where were you born?” can be coded as a string of letters from a fill-in-the-blank question or coded as letters or numbers from a multiple choice format question.– Do not use null in fill-in-the-blanks.
Data
Typical Tasks
• Importing data• Cleaning• Making a subset• Numeric and graphical summaries• Analyses with graphics• Summary reports
or• Doing simple math
Data
Basics
• While most people use SAS for processing complex collections of data, it can be used for simple math. The techniques that you use for simple math are also used to make complex changes to any size data sets.
• I hope this stuff will make your lives easier in statistics classes…
SAS
Using EG for MathSAS
SAS
A data set is shown in the flowchart.It’s contents are displayed in the programming windowpane.You can see it stored in the temporary “work library” by browsing the Server List.
SASMake a temporary
dataset to hold the answer.
The Log tab gives you feedback on what SAS did.
SAS
No Need for a Data SetFor a simple calculation you do not need to make a
dataset to hold a single number. You have the number show up in the log window.
1. Give SAS a formula. 1+1
2. Tell it what to call the results.theAnswer
3. Print the results out.putlog theAnswer =
4. Tell it you are done giving it instructions.
Use short meaningful names that do not include spaces, punctuation characters, or leading numbers.
SAS
Basic Math• You put the instructions together by typing a
program into the code window, like this:data _null_;theAnswer = 1 + 1;putlog theAnswer =;run;• Run it.
Don’t bother to store the results in a dataset.
SAS
The count of how many lines have been submitted
The Answer
SAS
Don’t panic….
• The help that ships with SAS is good.
• It is its own program hidden in the documentation subfolder inside the SAS folder off the Windows start button.
Search for functions and call routines by category
Click the Favorites tab.
Final Administrivia
• Please save a table for the people who are officially enrolled (or are taking the class for deferred credit).
• Bring a laptop with SAS if possible.• Grades (pass/fail only)– Pass 4 of 4 homework assignments for 3 units– Pass 3 of 4 homework assignments for 2 units