icpsr data exploration tools

56
ICPSR AT 50: Facilitating Research and Data Sharing Part I: Data Exploration IASSIST Vancouver, BC May 31, 2011

Upload: icpsr

Post on 11-Nov-2014

1.116 views

Category:

Education


0 download

DESCRIPTION

Part I of a workshop conducted by ICPSR. This deck describes data exploration tools.

TRANSCRIPT

Page 1: ICPSR Data Exploration Tools

ICPSR AT 50:Facilitating Research

and Data Sharing

Part I: Data ExplorationIASSIST Vancouver, BCMay 31, 2011

Page 2: ICPSR Data Exploration Tools

Welcome to Vancouver!Our Agenda

• Data Exploration– A Continuing Quest to Ease your Search– Social Science Variables Database– Bibliography of Data-related Literature

• Data Sharing– 2010 US Census Data– Public Data Collections

• Data Management– Data Management Plans– Computing & Data Sharing in Secure Environments– Managing Restricted Contracts

Page 3: ICPSR Data Exploration Tools

Managing the Clock

• Intro and Data Exploration (9:30-10:30)– Break

• Data Sharing (10:45-11:30)– Break

• Data Management (11:45–12:30)– Escape!

Disclaimer: Times are approximate!

Page 4: ICPSR Data Exploration Tools

• One of the world’s oldest and largest social science data archives, est. 1962

• Data distributed on punch cards, then reel-to-reel tape, now: – Data available on demand– Over 7,000 studies with over 65,000 data sets

• Membership organization among 21 universities, now:– Currently about 700 members world-wide– Federal funding of public collections

What is ICPSR? - Then and Now -

Page 5: ICPSR Data Exploration Tools

What We Do – It’s About Data!

• Seek research data and pertinent documents from researchers (PIs, research agencies, government)

• Process and preserve the data and documents

• Disseminate data

• Provide education, training, & instructional resources

Page 6: ICPSR Data Exploration Tools

Why People Use ICPSR

• Write articles, papers, or theses using real research data

• Conduct secondary research to support findings of current research or to generate new findings

• Use as intro material in grant proposals• Preserve/disseminate primary research

data– Fulfill data management plan (grant)

requirements• Study or teach quantitative methods

Page 7: ICPSR Data Exploration Tools

Data Exploration

Page 8: ICPSR Data Exploration Tools

The Challenge – Hoards of Data & Metadata

How does one make sense of:

• 7,000 studies• 65,000 datasets• 550,000 files• Millions of variables• 60,000 bibliographic citations

Page 9: ICPSR Data Exploration Tools

Data Exploration- Integrated Search -

Better Search for Better Results

Search Results

Docs, subjects, PIs, etc

SSVD the

variables

Data-related biblio

Page 10: ICPSR Data Exploration Tools

Integrating ICPSR’s Search“Sponsored by SOLR/Lucene”

• In 2009, an improved search engine• Later, construction of full-text search • Faceted search to narrow large result sets

Page 12: ICPSR Data Exploration Tools

Reviewing the Study Home Page

Page 13: ICPSR Data Exploration Tools

The Search Continues: Automatic Search Updates

• Receive automatic updates on the study or series

• And updates on your query

Page 14: ICPSR Data Exploration Tools

Data ExplorationThe Social Science Variables

Database

Search Results

Docs, subjects, PIs, etc

Data-related biblioSSVD

the variables

Page 15: ICPSR Data Exploration Tools

The Social Science Variables Database (SSVD)

Sanda Ionescu,Documentation Specialist

[email protected]

Page 16: ICPSR Data Exploration Tools

The Social Science Variables Database at ICPSR• Enables ICPSR users to search variables

across datasets• Assists in:

– Data discovery – Comparison / harmonization projects – Data harvesting – Data analysis– Question mining for designing new

research

Page 17: ICPSR Data Exploration Tools

The Social Science Variables Database at ICPSR

Tool for teaching– Research Methods:–Concept operationalization– Effect of question wording, context, and

answer categories on variable distributions– Substantive classes:–Cultural / social changes reflected in

different question wordings, or elicited answers (longitudinal or time series data)

Page 18: ICPSR Data Exploration Tools

The Social Science Variables Database at ICPSR• Officially launched Spring 2009.• Pre-launch: two to three years’

preparation period– Gather variable-level documentation;

apply/refine selection criteria, quality checks

– Build database to host variable descriptions

– Initial upload: 3,500 files describing data from about 1,300 studies.

Page 19: ICPSR Data Exploration Tools

The Social Science Variables Database at ICPSR• Variables documented using the Data

Documentation Initiative (DDI) specification• DDI: a standard for documenting social

science data, written in XML– Easy to parse / process– Allows fine-grained searches– Flexible display in a variety of formats – Highly shareable, promotes interoperability– Ideal archival format (ASCII, not software

dependent)

Page 20: ICPSR Data Exploration Tools

The Social Science Variables Database at ICPSR

DDI variable descriptions • Generated through an automated

process used archive-wide to produce ICPSR’S archival and distribution information packages

• Include question text if available in the source documentation

Page 21: ICPSR Data Exploration Tools

The Social Science Variables Database at ICPSR

Relational database• Built in Oracle as a separate entity, with

links to studies’ and series’ descriptions (also stored in Oracle)

• Compatible with both DDI 2 and 3 (input and output)

• Oracle Text searches used in Beta-testing phase– Slow retrieval– Limited to 500 results

Page 22: ICPSR Data Exploration Tools

The Social Science Variables Database at ICPSR• Search: autumn 2009 switched to Solr/Lucene:

• Easy indexing• Faster searches, unlimited hits• Facets/Filters imported from Study Descriptions (also

DDI compatible)– Series– Study– Time Period– Geography

• Storage: XML files are being indexed and searched directly – no longer uploaded in the database

Page 23: ICPSR Data Exploration Tools

The Social Science Variables Database at ICPSR

• Current content:– 2,602 studies (48 percent of ICPSR

holdings with data and setups)– 6,493 datasets– Approx. 1.7 million variables

• Continues to grow by including– All new releases, if suitable– Retrofits as made available by small-

scale projects

Page 24: ICPSR Data Exploration Tools

The Social Science Variables Database at ICPSR

• DDI fields searched:– Variable name– Variable label – Question text sequence – Descriptive text – Category label

• Variable notes – not indexed / searched, but they are displayed

Page 25: ICPSR Data Exploration Tools

The Social Science Variables Database at ICPSR

The Public Search Features:• Stemming• “Phrase searches”• Fielded searches (treated as a default

Boolean “and”: Boolean operators “or,” and “not” are ignored)– Variable label– Question text– Value labels

http://www.icpsr.umich.edu/icpsrweb/ICPSR/

Page 26: ICPSR Data Exploration Tools

The Social Science Variables Database at ICPSRProjected improvements/additional features:• Enable selection of multiple filters• Enable users to toggle on/off stemming• Enable searching “within” results (adding new

query to a result set)• Show / hide response categories on result page• Create interface for selecting results and

exporting selection in a particular format• From individual variable display, enable

navigation to previous or next variable (to show context)

Page 27: ICPSR Data Exploration Tools

The Social Science Variables Database at ICPSR

Usage data (source: Google Analytics)

Page 28: ICPSR Data Exploration Tools

Data ExplorationThe Bibliography of Data-related

Literature

Search Results

Docs, subjects, PIs, etc

SSVD the

variables

Data-related biblio

Page 29: ICPSR Data Exploration Tools

ICPSR Bibliography of Data-related Literature

Elizabeth MossAssistant Librarian, ICPSR

[email protected]

Page 30: ICPSR Data Exploration Tools

ICPSR Bibliography of Data-related Literature

What we will cover:

• What it is and how to access it

• How and why we developed it

• Main features

• How instructors find it useful

• You are a good source

Page 31: ICPSR Data Exploration Tools

What it is and how to access it

Page 32: ICPSR Data Exploration Tools

What it is and how to access it

It’s really a searchable database . . . containing 60,000 citations of known

published and unpublished works resulting from analyses of data archived at ICPSR

. . .that can generate study bibliographies associating each study with the literature

about it

. . . Now included in the integrated search on the ICPSR Web site

Page 33: ICPSR Data Exploration Tools

• Brainchild of Richard Rockwell, former ICPSR director

• Funded by a grant from the National Science Foundation in 2000 to build the collection and create a way to access it

• ICPSR membership and federally-funded archives continue to support it

How and why we developed it

Page 34: ICPSR Data Exploration Tools

• Resources using data in the ICPSR holdings as the primary data source

• Resources using ICPSR data in a comparison with the primary dataset investigated

• Resources "about" an ICPSR dataset or study series.

How and why we developed it

What’s in the collection?

Page 35: ICPSR Data Exploration Tools

How and why we developed it

Page 36: ICPSR Data Exploration Tools

http://www.icpsr.umich.edu/icpsrweb/ICPSR/citations/methodology.jsp

How and why we developed it

Page 37: ICPSR Data Exploration Tools

How and why we developed it

Page 38: ICPSR Data Exploration Tools

How and why we developed it

Demonstrate impact of data for funding

Page 39: ICPSR Data Exploration Tools

Main features

http://www.icpsr.umich.edu/icpsrweb/ICPSR/citations/index.jsp

Page 40: ICPSR Data Exploration Tools

Main features

Search features:• Searches the full text of the elements of

citations, e.g., title, author, journal• Boolean “and” is assumed, and phrase

searching in quotation marks:adolescents and “mental health” — this works

• No Boolean “or” “not”:Havens or “Havens, Jennifer” — this doesn’t work (becomes “and”)

Page 41: ICPSR Data Exploration Tools

Main features

Linking from the search results:

• To full text for journals Directly via DOI Using OpenURL via Google Scholar and

WorldCat

• To full text of reports and other resources via PDF or HTML links

• To the detailed, fielded publication record

Page 42: ICPSR Data Exploration Tools

Main features

Internal and external linking from the detailed citation record:• To the related study(s)

• To other citation records of publications by the same author

• To other articles in the same journal (but outside the search)

• To full text options

Page 43: ICPSR Data Exploration Tools

Main features

Exporting citations:

• From search results: Up to 500 records in

RIS format, exports directly to EndNote

• From individual detailed record: Export the citation in RIS format

Page 44: ICPSR Data Exploration Tools

Main features

Filtering and sorting features:

• Filter search results by author, pub type, journal, pub. year

• Coming soon—pub year range filter (similar to that in study search)

• Sort search results by relevance, pub date (oldest or newest), title, recency

Page 45: ICPSR Data Exploration Tools

Browse from main Bibliography page:

• By author name (no authority control)Juster, F. (2)Juster, F. Thomas (22)Juster, F.T. (1)

• By journal title name (authority control)

Main features

Page 46: ICPSR Data Exploration Tools

Main features

Link from individual study pages:• to the dynamically-generated study

bibliography• to series collections, when applicable

Link from series description pages:• to series bibliographies from the series

page

Page 47: ICPSR Data Exploration Tools

How instructors find it useful

Senior seminar classes

• Profs choose dataset and ask students to think of a research question

• Bibliography allows students to see the wide variety of topics available for a single dataset

Page 48: ICPSR Data Exploration Tools

How instructors find it useful

Research proposal design

• Good for finding studies that examine what a student wants to propose

• Does the data they would want already exist?

• If so, are there survey questions they could replicate?

• Authors’ suggestions for future research

Page 49: ICPSR Data Exploration Tools

How instructors find it useful

Undergraduate introduction

• Research papers—Good starting point for finding literature on a particular topic

• Finding data—Starting with the Bibliography can be more intuitive

Page 50: ICPSR Data Exploration Tools

How instructors find it useful

From the ICPSR blog:

“I can't say enough about how much I like the Bibliography of Data-related Literature. I find that students prefer to use this to identify key writings about data obtained from ICPSR. Students are sometimes really overwhelmed by trying to do literature searches in the many article databases subscribed to by the Library and they don't find what they need by using Google Scholar. So, I direct them to the Bibliography first to identify authors and subject terms. They can then use these to carry out successful searches in article databases.”

Page 51: ICPSR Data Exploration Tools

How instructors find it useful

From the ICPSR blog:

“As a companion to the Bibliography I also use the instructional tool: Exploring Data Through Research Literature (EDRL). I think Rachel Barlow did a fantastic job on this. I have adapted pieces of EDRL for use in class presentations with great success. If you are in a library and you are involved in information literacy activities, this is a great tool.”

Page 52: ICPSR Data Exploration Tools

The EDRL – an Online Module

How instructors find it useful

Page 53: ICPSR Data Exploration Tools

You are a good source

Get credit for your work AND let us know about that of others:

• Send a citation via the Web form

• Or send them in an email to [email protected]

• If you have a large library, we can take EndNote XML imports, or even RIS-format imports

Page 54: ICPSR Data Exploration Tools

You are a good source

Page 55: ICPSR Data Exploration Tools

You are a good source

A final request:

• When you write articles, reports, papers, and presentations that analyze or significantly discuss data, CITE the data

• Encourage others to do it, too

• Here’s how and why

Page 56: ICPSR Data Exploration Tools

Let’s Take a BreakReturn at 10:45