data collection, checking and cleaning & introduction to ... · data collection, checking and...

57
+ Data collection, checking and cleaning & Introduction to presenting statistical analyses Zulma Rueda Professor Universidad Pontificia Bolivariana, Colombia Adjunct professor, University of Manitoba, Canada [email protected]

Upload: ngonga

Post on 21-Apr-2018

226 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+

Data collection, checking and cleaning &Introduction to presenting statistical analyses

Zulma RuedaProfessor Universidad Pontificia Bolivariana, ColombiaAdjunct professor, University of Manitoba, [email protected]

Page 2: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+The use of statistics in research

nMedical statistics is the tool by which numerical information can translated into evidence

nThis evidence might be for the cause of a disease or for the effectiveness of an intervention

Peacock J, Kerry S. Presenting medical statistics from proposal to publication. Oxford University Press, 2007

Page 3: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+

n The positive side: researchers can now be independent, they don’t need a friendly statistician J

n The downside: n because statistical programs are easy to use, it is equally

easy to perform the wrong analysis Ln If the right analysis is performed, the programs often

produce a large amount of output: some relevant and some irrelevant

n The program (no matter if it is very sophisticated), do not generally tell the user whether a particular analysis is valid or not

The use of statistics in research

Peacock J, Kerry S. Presenting medical statistics from proposal to publication. Oxford University Press, 2007

Page 4: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+

n The aim: to share newly gained knowledge with others so that they can benefit from the findings for future research, professional practice, or both

n Sometimes the presentation of statistical information is not straightforward, and yet inadequate presentation will fail to communicate the relevant information, and may even communicate misleading or incorrect information

The use of statistics in research

Peacock J, Kerry S. Presenting medical statistics from proposal to publication. Oxford University Press, 2007

Page 5: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+Data collection

n It is related to the study protocol: what is actually collected and the format it is in

n It is important to know in advance what we are going to do with the data in order to ensure that it is collected in the right format

n It may be possible to answer a research question using existing data

Peacock J, Kerry S. Presenting medical statistics from proposal to publication. Oxford University Press, 2007

Page 6: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+ht

tps:

//w

ww

.cd

c.g

ov/i

mm

igra

ntre

fug

eehe

alth

/pd

f/tu

ber

culo

sis-

ti-2

009.

pd

f

Page 7: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+ht

tps:

//w

ww

.cd

c.g

ov/i

mm

igra

ntre

fug

eehe

alth

/pd

f/tu

ber

culo

sis-

ti-2

009.

pd

f

Page 8: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+Variables you collect in your daily work

n The medical history should focus on risk factors for tuberculosis disease including: previous history of tuberculosis; illness suggestive of tuberculosis (such as cough of >3 weeks’ duration, dyspnea, weight loss, fever, or hemoptysis); prior treatment suggestive of tuberculosis treatment; and prior diagnostic evaluation suggestive of tuberculosis. …

n … for children… fever, night sweats, growth delay, and weight loss.

n … inquiries regarding family or household contact with a person who has or had tuberculosis or illness, treatment, or diagnostic evaluation suggestive of tuberculosis.

n BCG vaccination

https://www.cdc.gov/immigrantrefugeehealth/pdf/tuberculosis-ti-2009.pdf

Page 9: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+Points concerning using existing datasets

n Can be cheaper and quicker than collecting new data

n The research question needs to be defined and researched in the same way as for a primary study

n A clear analysis plan is needed to avoid over-analysing the data

n Note that the dataset may not contain all the necessary information to answer the new question

Peacock J, Kerry S. Presenting medical statistics from proposal to publication. Oxford University Press, 2007

Page 10: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+Points concerning using existing datasets

n Where data are analysed that have been collected for another purpose, this is sometimes referred to as secondary analysis

Peacock J, Kerry S. Presenting medical statistics from proposal to publication. Oxford University Press, 2007

Page 11: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+When collecting original data:Beware of collecting too much data

nAsk yourself: nwhy are you collecting it?nWill it actually be analysed?

nDisadvantages of long questionnaires are:nThey may discourage people from taking

part and so lower the response rate or for answering all question

nQuestions may be answered less carefully

Peacock J, Kerry S. Presenting medical statistics from proposal to publication. Oxford University Press, 2007

Page 12: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+Beware of collecting too much data

nData processing time may be increased and results may be delayed

nThey may lead to multiple hypothesis test which increase the chance of spurious significance

nTime and money may be wasted

nHowever, it is important to collect what you do need as it may be difficult to get it later

Peacock J, Kerry S. Presenting medical statistics from proposal to publication. Oxford University Press, 2007

Page 13: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+Example of forms that you fill:

https://www.reginfo.gov/public/do/PRAViewIC?ref_nbr=201105-1405-007&icID=38745

Page 14: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+

https://www.reginfo.gov/public/do/PRAViewIC?ref_nbr=201105-1405-007&icID=38745

Example of forms that you fill:

Page 15: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+Transferring the data to computer: coding and data entry

n Before non-numeric data from a questionnaire of data collection form are entered to a computer, the responses need to be coded

n A unique number should be assigned to each possible response to facilitate statistical analysis

n The question would be coded 1: Immigrant, 2: special immigrant, 3: diversity, etc… according to the single answer given

Peacock J, Kerry S. Presenting medical statistics from proposal to publication. Oxford University Press, 2007

Page 16: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+Transferring the data to computer: coding and data entry

n Please tick all that apply:

n Although this is one question each person may tick a number of options. This needs to be entered as five separate variables each coded as ‘no’ or ‘yes’, which could be entered as 0 or 1:

n Tuberculosis disease: 0 or 1

n Syphilis, untreated: 0 or 1 …Peacock J, Kerry S. Presenting medical statistics from proposal to publication. Oxford University Press, 2007

Page 17: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+

n Missing data are undesirable in any study

n It may be important to be able to distinguish between data which are missing because the subject failed to respond (i.e. missed the question out completely), or where the answer was ‘don’t know’

n Create ‘don’t know’ as a valid answer and assigning it a separate code from answers which are truly missing (A blank or a code that is not valid for the other answers (e.g. 9)

Transferring the data to computer: coding and data entry

Peacock J, Kerry S. Presenting medical statistics from proposal to publication. Oxford University Press, 2007

Page 18: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+ Portion of a spreadsheet showing data collected from participants

Unique patient ID number

Variables name up to 8 characters long beginning

with a letter

Year of birth

Have you had rhinitis with a cold in the last 12 months

When did you have rhinitis

Repeat measurements given similar but unique variable names

Third measurements of pulse, systolic and diastolic blood measure

One line for each patient

Blank fields indicate

missing data

0= No1= Yes

1= Dry season2= Wet season3= Anytime

Peacock J, Kerry S. Presenting medical statistics from proposal to publication. Oxford University

Press, 2007

Page 19: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+Data checking and cleaning

n It is advisable to check the data as the study progresses rather than leave it until the end, as early checks may reveal problems which can be resolved

n Look for unlikely or impossible values or outliers, i.e. ‘DIAT2= 182’

n Possible errors like this can be identified if summary statistics and/or a histogram of the data

Peacock J, Kerry S. Presenting medical statistics from proposal to publication. Oxford University Press, 2007

Page 20: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+Data checking and cleaning

Peacock J, Kerry S. Presenting medical statistics from proposal to publication. Oxford University Press, 2007

Page 21: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+

n In some data entry programs, the user can set acceptable limits for each variable and thus force the computer to flag or reject values outside that range

n Errors can also occur where the data have been incorrectly entered but the value entered is a possible value, and so is not flagged

Data checking and cleaning

Peacock J, Kerry S. Presenting medical statistics from proposal to publication. Oxford University Press, 2007

Page 22: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+Checking for errors in the data

Data inconsistentPatient does not have rhinitis but answered

question about when rhinitis occurred

Value outside likely range:Diastolic blood pressure high although possible.

However measurements also inconsistent with first and third readings; possible transcription error?

Value outside likely rangeDiastolic blood pressure and systolic blood pressure too high and pulse too low. Measurements also inconsistent with first and third readings; likely that machine

was not working properly for this set of readings

Peacock J, Kerry S. Presenting medical statistics from proposal to publication. Oxford University Press, 2007

Page 23: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+Pe

acoc

k J,

Kerr

y S.

Pre

sent

ing

med

ical

sta

tistic

s fr

om p

rop

osal

to p

ublic

atio

n. O

xfor

d

Uni

vers

ity P

ress

, 200

7

Page 24: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+Sample size for Chlamydia prevalence study

n Aim: n To calculate the prevalence of Chlamydia infections

among women attending the GP for cervical smears

n Information required: n Estimate of the prevalence= 7% (from previous studies)

n Confidence level= 95% (decided by the researcher)

n Accuracy of +/- 1.4 percentage points (decided by the researcher)

n Required sample size:n 1300 women (from Epi-info)

Peacock J, Kerry S. Presenting medical statistics from proposal to publication. Oxford University Press, 2007

Page 25: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+Sample size for sensitivityn Aim:

n To calculate the sensitivity and specificity of nuchal translucency screening for chromosomal abnormalities using an unselected cohort of pregnant women

n Information required: n Estimate of the prevalence of chromosomal abnormalities=

1% (from previous studies)

n Estimate of sensitivity= 70% (from previous studies)

n Confidence level= 95% (decided by the researcher)

n Accuracy of +/- 20% points (decided by the researcher)

n Required sample size:n 20 babies with chromosomal abnormalities (1% of population)

and hence 2000 pregnant women required overallPeacock J, Kerry S. Presenting medical statistics from proposal to publication. Oxford University Press, 2007

Page 26: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+Sample size for comparing two proportionsn Aim: To compare prevalence of death or chronic lung disease in

premature babies randomized to methods of ventilation

n Outcome: Death or CLD at 36 weeks post menstrual age

n Information required: n Estimate of the prevalence in control group= 67% (from previous

studies)

n Significance level= 5% (decided by the researcher)

n Risk difference to be detected= 11% (i.e. 56% in intervention g)

n Power: 90% (decided by researcher)

n Babies will be randomized to two equal- sized groups)

n Required sample size: 428 babies in each group

Peacock J, Kerry S. Presenting medical statistics from proposal to publication. Oxford University Press, 2007

Page 27: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+Sample size for comparing two means

n Aim: n To compare mean birthweight of babies in different social

class subgroup

n Information required: n Estimate of the standard deviation of the birthweight= 500g

(from previous studies)

n Significance level= 95% (decided by the researcher)

n Difference to be detected= 180g (from previous studies)

n Power= 90% (decided by the researcher)

n Required sample size:n 163 women in each group

Peacock J, Kerry S. Presenting medical statistics from proposal to publication. Oxford University Press, 2007

Page 28: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+Presenting numerical data and common statistics

n Proportionsn Give to two significant figures (e.g. 0.25, 0.0056)

n Give numbers as well as the actual proportion unless obvious

n Use percentage or rate per 1000, 10000, etc. if proportion is very small

n Percentagesn Give percentages less than 10 or greater than 90 to one

decimal place (e.g. 5.2%, 93.8%)

n Consider giving percentages between 10% and 90% as whole numbers, unless the extra precision is needed (e.g. 27% vs 37%)

Peacock J, Kerry S. Presenting medical statistics from proposal to publication. Oxford University Press, 2007

Page 29: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+Presenting numerical data and common statistics

nPercentagesn Give numbers as well as actual percentage

unless obvious but make clear which is whichn Do not use percentages if sample is less than 10

n Mean, SD, SEn Present to one more significant figure than

original datan Do not use +/- as this is potentially ambiguous.

Use ‘mean (SD)=…’ or ‘mean (SE)= …’

Peacock J, Kerry S. Presenting medical statistics from proposal to publication. Oxford University Press, 2007

Page 30: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+Presenting numerical data and common statistics

n CIsn Present to one more significant figure than original data

n Present as ‘2 to 4’ or ‘2, 4’ not ‘2-4’ since this is ambiguous if negative values are possible

n P values

n Present actual P values wherever possible whether significant or not

n Give no more than two significant figures, e.g. 0.0392 –0.039; 0.596 – 0.60

n If package gives P= 0.0000 present as <0.0001

Peacock J, Kerry S. Presenting medical statistics from proposal to publication. Oxford University Press, 2007

Page 31: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+Beginning the results section

n Before performing the main analyses it is important to describe how the sample was obtained and the main relevant characteristics:n Sampling framen Number of subjects originally selectedn Number of subjects subsequently excluded because

of ineligibilityn Number of non-responders or no datan Comparison of responders and non-responders if

possiblen Number of subjects withdrawing before completing

the study

Peacock J, Kerry S. Presenting medical statistics from proposal to publication. Oxford University Press, 2007

Page 32: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+

Rueda ZV, López L, Vélez LA, Marín D, Giraldo MR, et al. (2013) High Incidence of Tuberculosis, Low Sensitivity of Current Diagnostic Scheme and Prolonged Culture Positivity in Four Colombian Prisons. A Cohort Study. PLoS ONE

8(11): e80592.

Page 33: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+

Nuzzo JB, Golub JE, Chaulk P, Shah M. Postarrival Tuberculosis Screening of High-Risk Immigrants at a Local Health Department. Am J Public Health. 2015 Jul;105(7):1432-8.

Page 34: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+Guidelines for tables

n Title should explain what the graph is about and what subjects or observations are included

n Give number of subjects or observations in each group

n Label rows and columns clearly

n Give confidence intervals for comparisons, not just P values

n Give SD, SE, or CI for means

Peacock J, Kerry S. Presenting medical statistics from proposal to publication. Oxford University Press, 2007

Page 35: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+Guidelines for tables

n Give percentages alongside frequencies unless group size is less than 10

n Give range or IQR for medians

n State units used

n Use consistent and appropriate decimal places

n Refer to table in the text

n Keep table simple for slides or poster and check text size for legibility

Peacock J, Kerry S. Presenting medical statistics from proposal to publication. Oxford University Press, 2007

Page 36: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+

Wilson FA, Miller TL, Stimpson JP. Mycobacterium Tuberculosis Infection, Immigration Status, and Diagnostic Discordance: A Comparison of Tuberculin Skin Test and QuantiFERON-TB Gold In-Tube Test Among Immigrants to the U.S. Public Health Rep. 2016 Mar-Apr;131(2):303-10

Page 37: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+Guidelines for graphs

n Title should explain what the graph is about and what subjects or observations are included

n Give number of subjects or observations

n Label axes, giving units as appropriate

n Refer to graph in the text

n Does the graph show enough information to justify the space it takes?

Peacock J, Kerry S. Presenting medical statistics from proposal to publication. Oxford University Press, 2007

Page 38: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+Guidelines for graphs

n For clarity, use two-dimensional rather than three-dimensional graphs

n For a paper: is the graph necessary? Could the data be presented in another way?

n In a slide or poster: will the text be legible?

Peacock J, Kerry S. Presenting medical statistics from proposal to publication. Oxford University Press, 2007

Page 39: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+

Liu Y, Posey DL, Cetron MS, Painter JA. Effect of a culture-based screening algorithm on tuberculosis incidence in immigrants and refugees bound for the United States: a

population-based cross-sectional study. Ann Intern Med. 2015 Mar 17;162(6):420-8.

Page 40: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+

Liu Y, Posey DL, Cetron MS, Painter JA. Effect of a culture-based screening algorithm on tuberculosis incidence in immigrants and refugees bound for the United States: a

population-based cross-sectional study. Ann Intern Med. 2015 Mar 17;162(6):420-8.

Page 41: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+

Liu Y, Posey DL, Cetron MS, Painter JA. Effect of a culture-based screening algorithm on tuberculosis incidence in immigrants and refugees bound for the United States: a

population-based cross-sectional study. Ann Intern Med. 2015 Mar 17;162(6):420-8.

Page 42: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+Li

u Y

, Pos

ey D

L, C

etro

nM

S, P

aint

er JA

. Effe

ct o

f a c

ultu

re-b

ased

scr

eeni

ng a

lgor

ithm

on

tub

ercu

losi

s in

cid

ence

in im

mig

rant

s an

d r

efug

ees

bou

nd fo

r th

e U

nite

d S

tate

s: a

p

opul

atio

n-b

ased

cro

ss-s

ectio

nal s

tud

y. A

nn In

tern

Med

. 201

5 M

ar 1

7;16

2(6)

:420

-8.

Page 43: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+Comparing two or more sets of data

https://www.pinterest.com/pin/184084703492988528/

Page 44: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+R

ued

a ZV

, Lóp

ez L

, Vél

ez L

A, M

arín

D, G

iral

do

MR

, et a

l. (2

013)

Hig

h In

cid

ence

of

Tub

ercu

losi

s, L

ow S

ensi

tivi

ty o

f Cur

rent

Dia

gno

stic

Sch

eme

and

Pro

long

ed C

ultu

re P

osit

ivit

y in

Fou

r C

olom

bia

n Pr

ison

s. A

Coh

ort S

tud

y. P

LoS

ON

E 8

(11)

: e80

592.

Page 45: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+R

ued

a ZV

, Lóp

ez L

, Vél

ez L

A, M

arín

D, G

iral

do

MR

, et a

l. (2

013)

Hig

h In

cid

ence

of

Tub

ercu

losi

s, L

ow S

ensi

tivi

ty o

f Cur

rent

Dia

gno

stic

Sch

eme

and

Pro

long

ed C

ultu

re

Posi

tivi

ty in

Fou

r C

olom

bia

n Pr

ison

s. A

Coh

ort S

tud

y. P

LoS

ON

E 8

(11)

: e80

592.

Page 46: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+From association to causation

Gordis L. Epidemiology 5th. ed. Elseiver; 2014

Page 47: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+Correlation does not implycausation

Gordis L. Epidemiology 5th. ed. Elseiver; 2014

Page 48: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+Guidelines for judging whether an observed association is Causal

nTemporal relationshipnStrength of the associationnDose-response relationshipnReplication of the findingsnBiologic plausibilitynConsideration of alternate explanationsnCessation of exposurenConsistency with other knowledgenSpecificity of the association

Gordis L. Epidemiology 5th. ed. Elseiver; 2014

Page 49: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+We can not forget some key aspects about our study design

Page 50: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+

nThe extent that a test result reflects the true value, that is, it is valid, depends on minimizing two major classes of error: systematic error (bias) and random error Image taken from:

http://nothingnerdy.wikispaces.com/1+Physics+and+physical+measurement

Page 51: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+Two types of error

n Systematic error:n Poor accuracyn Reproducible n Due to selection &/or information bias, or confounding

n Random error:n Poor precisionn Not reproducible

Image taken from: http://nothingnerdy.wikispaces.com/1+Physics+and+physical+measurement

Page 52: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+Determining a study protocol

nIdentify all data handling and processing steps, from specimen collection to recording data in a database

nAssess the potential for error at each step, and the error tolerance

nDetermine the reliability of the selected measure across a range of values

Page 53: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+Internal and external validity

53

Gordis L. Epidemiology 5th. ed. Elseiver; 2014

Page 54: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+Checklist for writing up a research study

n Abstractn Stand-alone document

n Report main outcome with estimates and 95% CI if possible

n Draw valid conclusions

n Introductionn What is the research question?

n What do we know already?

n What are the gaps?

n What does this study add?

Page 55: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+Checklist for writing up a research study

n Methodsn Describe study design and conduct

n Choice of subjects

n Sample size

n Data collected

n Statistical analysis

n Resultsn Describe characteristics of the sample

n Describe findings

n Don’t just give P values –present estimates and CIs

Page 56: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+Checklist for writing up a research study

n Discussionn Summarise findings

n Describe how they fit with existing knowledge

n Discuss any limitations

n Draw conclusions and make suggestions for future research

n Check this webpage for reporting guidelines:n http://www.equator-network.orgn http://collections.plos.org/reporting-guidelines

Page 57: Data collection, checking and cleaning & Introduction to ... · Data collection, checking and cleaning & Introduction to presenting statistical analyses ZulmaRueda Professor Universidad

+

Thanks!!!