a strategy to promote access to hdss data for researchers ... · • public access databasepublic...

23
A strategy to promote access to HDSS data for researchers and scientists researchers and scientists the 1:10 Sample Dataset the 1:10 Sample Dataset Agincourt HDSS-Wits SPH Ni bi U b HDSS APHRC N i bi Nairobi Urban HDSS, APHRC, Nairobi Institute of Behavioral Science, University of Colorado, Boulder

Upload: others

Post on 27-Sep-2020

4 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: A strategy to promote access to HDSS data for researchers ... · • Public access databasePublic access database for researchers andfor researchers and students explore HDSS data

A strategy to promote access to HDSS data for

researchers and scientistsresearchers and scientists

the 1:10 Sample Datasetthe 1:10 Sample Dataset

Agincourt HDSS-Wits SPHN i bi U b HDSS APHRC N i biNairobi Urban HDSS, APHRC, Nairobi

Institute of Behavioral Science, University of Colorado, Boulder

Page 2: A strategy to promote access to HDSS data for researchers ... · • Public access databasePublic access database for researchers andfor researchers and students explore HDSS data

OutlineOutline

• Why develop a sample database?• What is the 1:10?What is the 1:10?• Using the 1:10 for teaching

Wh t h l d t b hi d?• What has a sample database achieved?• Challenges

Page 3: A strategy to promote access to HDSS data for researchers ... · • Public access databasePublic access database for researchers andfor researchers and students explore HDSS data

Access to HDSS data: h ll f ichallenges for sites

Extracting datasets takes time and• Extracting datasets takes time and resourcesProject specific datasets require clear• Project specific datasets require clear, knowledgeable data requestsUser friendly documentation needed eg• User-friendly documentation needed eg comprehensive data dictionaryQuality of research based on HDSS data• Quality of research based on HDSS data needs to be ensured

• Protecting confidentiality of participants and• Protecting confidentiality of participants and small area communities

Page 4: A strategy to promote access to HDSS data for researchers ... · • Public access databasePublic access database for researchers andfor researchers and students explore HDSS data

Training issuesTraining issues

• Making data available is not enough –Making data available is not enoughtraining and support needed to use complex longitudinal informationcomplex longitudinal information

• Universities need datasets for grad• Universities need datasets for grad student research – masters and doctoral

• Faculty may not have skills to analyze longitudinal data (or supervise students)longitudinal data (or supervise students)

Page 5: A strategy to promote access to HDSS data for researchers ... · • Public access databasePublic access database for researchers andfor researchers and students explore HDSS data

Challenges• To increase data access without increasing

demand for individual, tailored datasets• To provide training on longitudinal data

management and analysis so that students, their i f lt HDSS d tsupervisors, faculty can use HDSS data

A responseA response• 1:10 Sample Database

– conceptualized and developed through collaboration between Agincourt and Nairobi HDSS, Wits U, & University of Colorado at Bouldery

Page 6: A strategy to promote access to HDSS data for researchers ... · • Public access databasePublic access database for researchers andfor researchers and students explore HDSS data

Goals of 1:10 Sample SatabaseGoals of 1:10 Sample Satabase

• Public access database for researchers andPublic access database for researchers and students explore HDSS data

• Improve quality of data requests to site• Improve quality of data requests to site• Provide experiential training courses on use

f l it di l d tof longitudinal data

therebythereby • Enhancing research & training through

increasing access to HDSS dataincreasing access to HDSS data

Page 7: A strategy to promote access to HDSS data for researchers ... · • Public access databasePublic access database for researchers andfor researchers and students explore HDSS data

What is the 1:10 Sample Database?p

• Agincourt / Nairobi Sample Database: i f f ll d t b t i d fversion of full database stripped of

identifiers

Page 8: A strategy to promote access to HDSS data for researchers ... · • Public access databasePublic access database for researchers andfor researchers and students explore HDSS data

What is the 1:10?What is the 1:10?

• Includes 10% of geographic locations in each village; and full information on all individuals in each location over full period of data collectioneach location over full period of data collection– thus retains relational, temporal and data integrity of

full database

• Maintains structure of full database but simplified– Observations limited to one per year; single date– More complex variables removed– Some adjustment – ‘normalization’ – to obtain

representivity of the full datasetp y

Page 9: A strategy to promote access to HDSS data for researchers ... · • Public access databasePublic access database for researchers andfor researchers and students explore HDSS data

Validated sampleValidated sample

• Created and compared counts of births, deaths, in- and out-migrations, and household size at population and sample level to assurepopulation and sample level to assure representativeness of sample

• Sample means of these counts fall within one standard deviation of full database means:standard deviation of full database means:– Rates using event counts reasonably comparable

between the sample and full databases

Page 10: A strategy to promote access to HDSS data for researchers ... · • Public access databasePublic access database for researchers andfor researchers and students explore HDSS data

Advantages 1:10 Sample DatabaseAdvantages 1:10 Sample Database

• Subsamples can be easily extractedSubsamples can be easily extracted• Anonymized version can be updated

regularlyregularly• User-friendly documentation:

– study setting and publications– database structure– data dictionary– standard agreement on use

Page 11: A strategy to promote access to HDSS data for researchers ... · • Public access databasePublic access database for researchers andfor researchers and students explore HDSS data

Documentation on websiteDocumentation on website• ADSS 1in10 Dataset Presentation.pptpp• AHPU Data Dictionary v2(Draft).pdf• AHPU.1_10.dictionary.pdf

AHPUD U A df• AHPUDataUseAgreeme.pdf• DM-DataRequest-v7.doc• DM SampleRequest v2 doc• DM-SampleRequest-v2.doc• Display forms• OneInTenDataset20070416.zipO e e ataset 00 0 6 p• SD-TUTORIAL2-V1(ODBC).pdf• SD_TUTORIAL1-V1(CREATE DATASET).pdf

Page 12: A strategy to promote access to HDSS data for researchers ... · • Public access databasePublic access database for researchers andfor researchers and students explore HDSS data

Controls on databaseControls on database

• Signed data agreement in order to useSigned data agreement in order to use datasets from sample database

• Request to publish on work developed• Request to publish on work developed using sample database:

l f f ll t il d d t t– apply for a full tailored dataset– sign a confidentiality agreement

l il d d– re-run analyses on tailored dataset

Page 13: A strategy to promote access to HDSS data for researchers ... · • Public access databasePublic access database for researchers andfor researchers and students explore HDSS data

Agincourt Data Request form• Project Name: <title>• Sample Name: <title>• Auther: <authers>

D t <d t >

• Variables:• <list variable from data dictionary that need to be

include with the sample>• <• Date: <date>

• Version: <version>• Purpose (condensed to protocol): • <Condensed protocol describing project>• Analytical Plan:• <analysis plan the justifies the data requested>

• <• Village• ExternalID• Name• Surname• Strata<analysis plan the justifies the data requested>

• Sample requirements• Sample Population: • <define population with bulleted specific criteria for each

strata• Criteria 1

C it i 2

• Guardian Name• Etc…• >• <list variable from data dictionary>• http://www.npongo.com/agincourt/AHPUDataDic.zip

i d t d t t• Criteria 2• Criteria 3• Criteria 4• >• Other Sample Considerations:• <describe other considerations for drawing the sample

• <sign and return data use agreement>• www.npongo.com\agincourt\AHPUDataAgreeme.pd

f• Data Cleaning: • <Description of any dirt data found,>• Case: <description of specific case of dirty data>• <describe other considerations for drawing the sample

i.e. village clustering or study logistic concerns, exclusion from other studies>

• Method for Drawing Sample:• <describe the procedure for drawing the sample>• Unit of analysis:

<I di id l H h ld Vill Sit Mi d>

Case: description of specific case of dirty data• Logic: <logic used the identifier dirty data>• Code: <programmatic code used to identify

dirty data>• Code Type: <ie SQL, Stata, SAS etc.>

• <Individual, Household, Village, Site, Mixed>

Page 14: A strategy to promote access to HDSS data for researchers ... · • Public access databasePublic access database for researchers andfor researchers and students explore HDSS data

AGINCOURT STUDENT DATA AGREEMENT

Page 15: A strategy to promote access to HDSS data for researchers ... · • Public access databasePublic access database for researchers andfor researchers and students explore HDSS data

Using 1:10 for training

F l it di l d t t & l i• Focus: longitudinal data management & analysis

• Intensive, experiential training– 2 weeks residential, 1 week self-study– Provide context of site: exposure to range of

research conducted using diverse study designs– Use of Agincourt / Nairobi database, data

management, statistical analysish d i i l it di l th d• hands-on experience using longitudinal methods

Page 16: A strategy to promote access to HDSS data for researchers ... · • Public access databasePublic access database for researchers andfor researchers and students explore HDSS data

Guided exercises using HDSS data

M i l it di l• Managing longitudinal data using STATA and Microsoft Access

• Introduce students to statistical analysis using STATASTATA– Descriptive analysis of

fertility and mortality trends– Event history analysis– Hazard modeling– etc

Page 17: A strategy to promote access to HDSS data for researchers ... · • Public access databasePublic access database for researchers andfor researchers and students explore HDSS data

Experiential learning: Group projects

• Academic skills training– Developing research questions – Constructing data sets– Preliminary analysise a y a a ys s– Literature reviews– Presentation of work

Write up of research paper– Write-up of research paper• Research

question/hypotheses/objective• Analysis• Analysis• Results• Discussion• Interpretation Student groups at work• Interpretation Student groups at work

in computer laboratory

Page 18: A strategy to promote access to HDSS data for researchers ... · • Public access databasePublic access database for researchers andfor researchers and students explore HDSS data

Examples of student group projectsExamples of student group projects• Correlates of out-migration in

Agincourt HDSSAgincourt HDSS

• Mortality and Food Security: A Multi-Pathway Association

• Communicable and non-communicable causes of death in Agincourt HDSS: Patterns and gtrends from 2000-2005

• Parity progression in the context of fertility decline in rural South Africafertility decline in rural South Africa

• Fertility in Agincourt: Does the education level of female

Project presentations

participants matter?

Page 19: A strategy to promote access to HDSS data for researchers ... · • Public access databasePublic access database for researchers andfor researchers and students explore HDSS data

What has the 1:10 achieved?What has the 1:10 achieved?Before

h d iAfter

1 10 l d b d f ll• researcher or student supervisor submits proposal &data request

• Agincourt team review & approve• Agincourt data specialist writes

• 1:10 sample database and full documentation available on a password-protected website

• Researchers/students can freely Agincourt data specialist writes unique extraction script to create tailored dataset

yexplore data

• Researchers/students thus able to better specify full data request

Bottleneck - time consumingCostly – time, resourcesUnderspecified data requests due to

1:10 reduces work for Agincourt data team

Better specified data requestsUnderspecified data requests due to inadequate knowledge

Feelings of frustration & pressure (HDSS); neglect (researchers, t d t )

p qFaster approval of outside workFaster production of tailored

datasetsM t d t h j tstudents) More student research projects

Page 20: A strategy to promote access to HDSS data for researchers ... · • Public access databasePublic access database for researchers andfor researchers and students explore HDSS data

Increase in masters students corresponds with 1:10 sample databasecorresponds with 1:10 sample database

Masters Enrollment (1996 - 2007)

7

8

( )

1 10 d t b

5

6

7

er

1:10 database available in 2006

3

4

Num

be

0

1

2

1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007

YearRegistered

Page 21: A strategy to promote access to HDSS data for researchers ... · • Public access databasePublic access database for researchers andfor researchers and students explore HDSS data

Challenges (1)

• 1:10 useful to explore database… but not ppopular for final student research report– can request full dataset if results promisingca equest u dataset esu ts p o s g

• Availability does not ensure researchers• Availability does not ensure researchers, students, supervisors able to use data

courses on longitudinal data management and– courses on longitudinal data management and analysis needed

– Colorado-Wits-Nairobi experience useful– Colorado-Wits-Nairobi experience useful

Page 22: A strategy to promote access to HDSS data for researchers ... · • Public access databasePublic access database for researchers andfor researchers and students explore HDSS data

Challenges (2)Challenges (2)

• Usability of a relational database - differences in disciplinary expertise;differences in disciplinary expertise; transforming to flatfiles

• Expanding the 1:10 model to other HDSS sites in INDEPTHsites in INDEPTH – requires investment in preparation of 1:10 and

documentationdocumentation

Page 23: A strategy to promote access to HDSS data for researchers ... · • Public access databasePublic access database for researchers andfor researchers and students explore HDSS data

What have we learned?What have we learned?

• Training needed for students andTraining needed for students and researchers to use HDSS database

• Experience needed to produce good dataExperience needed to produce good data request

1:10 sample database useful in meeting these needsthese needs

Experience can be transferred to other HDSS sitesHDSS sites