si 544 introductory statistics and data analysis ... · introductory statistics and data analysis...
TRANSCRIPT
SI 544Introductory
Statistics andData Analysis
Lada Adamic SI 544 Introductory Statistics and DataAnalysis
(Preliminaries)
Lada Adamic
School of Information,University of Michigan
Jan. 3rd, 2007
SI 544Introductory
Statistics andData Analysis
Lada Adamic
Outline
SI 544Introductory
Statistics andData Analysis
Lada Adamic
motivation
Dick DeVaux:"We haven’t evolved to be statisticians. Our students whothink statistics is an unnatural subject are right. This isn’thow humans think naturally. But it is how humans thinkrationally. And it is how scientists think. This is the way wemust think if we are to make progress in understanding howthe world works and, for that matter, how we ourselveswork."
SI 544Introductory
Statistics andData Analysis
Lada Adamic
motivation
Gary King (Department of Government, Harvard University):
"Statisticians will rule the world."
(When discussing the opportunities that availability of massivedata sets will present for addressing questions in social science.)
SI 544Introductory
Statistics andData Analysis
Lada Adamic
how I see things
There is lots of interesting dataWe need to describe what is going onFor this we need descriptive statistics
need to summarizeneed to visualize
We need to tell whether what we are seeing is anactual trend, or part of the random “noise”To this end we need to
Understand probabilityUnderstand probability distributions (what is thelikelihood that this would occur by chance?)
SI 544Introductory
Statistics andData Analysis
Lada Adamic
example (from Dick DeVaux)
A town has two hospitalsLarge hospital, about 100 babies a daySmaller hospital, about 15 babies a day
Over the course of the year, which hospital (if either)would probably have more days in which more than60% of the babies born are male?
SI 544Introductory
Statistics andData Analysis
Lada Adamic
how I see things (continued)
In order to understand what is going on, we need amodelFor example, we want to figure out whether watchingviolent movies is correlated with violent behaviorIt is standard to attempt to reject the null hypothesisthat there is no correlation between the variablesConsideration
data sampleexperimental designconfounding variableschoice of method
SI 544Introductory
Statistics andData Analysis
Lada Adamic
what we’ll cover
ProbabilityDescriptive statisticsInferential Statistics
Sampling distributions: confidence intervals, hypothesistests and p-valuesEstimating population meanComparing population meansAnalysis of varianceUnivariate and multivariate OLS RegressionAnalysis of categorical dataData CollectionExperimental design
SI 544Introductory
Statistics andData Analysis
Lada Adamic
Outline
SI 544Introductory
Statistics andData Analysis
Lada Adamic
basics
This is SI 544I’m Lada Adamic ([email protected])Please respond to survey on office hours:http://doodle.ch/participation.html?pollId=29v979qxx4tdvmr5Make sure you have access to the cTools siteSome materials are available athttp://www-personal.umich.edu/ ladamic/courses/si544w08/
SI 544Introductory
Statistics andData Analysis
Lada Adamic
meetings
every Thursday in West Hall 409every Tuesday in the DIAD (4th floor of Shapiro library)
work on your own laptopwork on lab PC or Mac
SI 544Introductory
Statistics andData Analysis
Lada Adamic
textbooks
Introductory Statistics for the Behavioral Sciences (5thor 6th Edition) by Welkowitz, Ewen, and Cohen
orJohn Verzani: Using R for Introductory Statistics
SI 544Introductory
Statistics andData Analysis
Lada Adamic
R
We will be using Ropen source http://www.r-project.org/
many additional modules available:cran.r-project.org/
on cTools under Resources you can find several niceonline tutorialssteeper learning curve than most other statisticalpackages, but · · ·
free −− so you can use it whatever your job may beprogrammable −− you can create your own modulesyou will be able to switch to other software relativelyeasily
SI 544Introductory
Statistics andData Analysis
Lada Adamic
grading scheme
20% midterm (in class)25% final (take home)25% problem sets
I will drop your lowest problem set score.
20% group project (small)5% news evaluation5% participation
attendancespeaking up in classposting to cTools
SI 544Introductory
Statistics andData Analysis
Lada Adamic
problem sets
turn in only in PDF formatturn in only on cToolslate assignments
can turn in up to 2 days late with a 10% penalityemail me for extensions for legit reasons (medical &family emergencies)include a note about granted extension on cTools whensubmittinggrader Laurel Shipley: email me if there are anyquestions about grading
you are encouraged to collaborate with yourclassmates, but turn in your own workrun your own R code
SI 544Introductory
Statistics andData Analysis
Lada Adamic
cTools participation
You can earn participation points by posting content tocTools
post clarification questions to cToolsThere is no point to being stuck on some syntax in R forhours on end. Just ask for help on the cTools forum.answer others’ questionspost interesting links
SI 544Introductory
Statistics andData Analysis
Lada Adamic
group project
The project is not a major project.At the end you will turn in 3 pages.
form groups of 3-4 peopletime lineJan. 24 form group and select topic (2 pts)March 4th project progress report (3 pts)April 8 project report (10 pts)April 10 presentations (5 pts)
examples of topics from last yearhistory of violent arrests and NBA performancetemperature and beer consumptionnumber of books and ratings on LibraryThingpacing and rowing performance
SI 544Introductory
Statistics andData Analysis
Lada Adamic
news review
5% of your final gradefind news article and critically compare to original study
SI 544Introductory
Statistics andData Analysis
Lada Adamic
exemptions for HCI and IAR stats requirement
http://www-personal.umich.edu/~ladamic/courses/si544f06/statswaiver.html
SI 544Introductory
Statistics andData Analysis
Lada Adamic
Courses you can take instead
Stat. 350 - Introduction to statistics and data analysis(undergrad, cannot count as cognate)Stat. 400 - Applied Statistical MethodsStat. 500 - Applied Statistical Methods (note, has 350as a prerequisite)Biostatistics 510: http://www-personal.umich.edu/~kwelch/510/biostat510.htm
Biostatistics 503:http://www.sph.umich.edu/iscr/caid/display_course.cfm?CourseID=BIOSTAT503
Biostatistics 553http://www.sph.umich.edu/iscr/caid/display_course.cfm?courseID=BIOSTAT553
Sociology 510: statisticsOMS 501: Applied Business Statistics
SI 544Introductory
Statistics andData Analysis
Lada Adamic
Why you should still take this course
neat datasets relevant to HCI & IARwe focus on the relevant skills and critical thinkingwe go easy on the math
no calculus requiredsome algebra helpful
SI 544Introductory
Statistics andData Analysis
Lada Adamic
Outline
SI 544Introductory
Statistics andData Analysis
Lada Adamic
data types
QuantitativeDiscreteContinuous
QualitativeNominal (categorical)Ordinal (rank ordered categories)
SI 544Introductory
Statistics andData Analysis
Lada Adamic
exercise: name the data type
bacteria countoccupations of shoppersUSNWR ranking of universitymarital statustime (in months) since last auto maintenancehandedness
SI 544Introductory
Statistics andData Analysis
Lada Adamic
Counties with highest rates of kidney cancer
SI 544Introductory
Statistics andData Analysis
Lada Adamic
Counties with lowest rates of kidney cancer
SI 544Introductory
Statistics andData Analysis
Lada Adamic
summary
class logistics (questions?)motivation for learning statisticsstarting to gather data!
Next time: R tutorial by Mick McQuaid