managing the metadata lifecycle the future of ddi at gesis and icpsr peter granda, icpsr meinhard...

24
Managing the Metadata Managing the Metadata Lifecycle Lifecycle The Future of DDI at GESIS and ICPSR The Future of DDI at GESIS and ICPSR Peter Granda, ICPSR Peter Granda, ICPSR Meinhard Moschner, GESIS Meinhard Moschner, GESIS Mary Vardigan, ICPSR Mary Vardigan, ICPSR Joachim Wackerow, GESIS Joachim Wackerow, GESIS Wolfgang Zenk-Möltgen, GESIS Wolfgang Zenk-Möltgen, GESIS

Post on 21-Dec-2015

223 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Managing the Metadata Lifecycle The Future of DDI at GESIS and ICPSR Peter Granda, ICPSR Meinhard Moschner, GESIS Mary Vardigan, ICPSR Joachim Wackerow,

Managing the Metadata Managing the Metadata Lifecycle Lifecycle

The Future of DDI at GESIS and ICPSRThe Future of DDI at GESIS and ICPSR

Peter Granda, ICPSRPeter Granda, ICPSR

Meinhard Moschner, GESISMeinhard Moschner, GESIS

Mary Vardigan, ICPSRMary Vardigan, ICPSR

Joachim Wackerow, GESISJoachim Wackerow, GESIS

Wolfgang Zenk-Möltgen, GESISWolfgang Zenk-Möltgen, GESIS

Page 2: Managing the Metadata Lifecycle The Future of DDI at GESIS and ICPSR Peter Granda, ICPSR Meinhard Moschner, GESIS Mary Vardigan, ICPSR Joachim Wackerow,

Research Data Life Cycle

CollectionConceptProcessin

gDistributi

onDiscovery Analysis

Archiving

Repurposing

Page 3: Managing the Metadata Lifecycle The Future of DDI at GESIS and ICPSR Peter Granda, ICPSR Meinhard Moschner, GESIS Mary Vardigan, ICPSR Joachim Wackerow,

Current Uses of DDI

• DDI 2 used for many different purposes by many different archival institutions, e.g., metadata records for data catalogs, export to Web-based information systems such as Nesstar, long-term preservation, and PDF codebooks

• GESIS and ICPSR are developing procedures and systems to extend use of DDI in their institutions

Page 4: Managing the Metadata Lifecycle The Future of DDI at GESIS and ICPSR Peter Granda, ICPSR Meinhard Moschner, GESIS Mary Vardigan, ICPSR Joachim Wackerow,

DDI 3 Expands in Scope

• To date use mainly limited to Distribution and Archiving stages of data life cycle

• DDI 3 enables use of new elements and structures to extend markup to other stages of the life cycle - both earlier and later

• Emphasis is on projects and tasks already in process at each institution

Page 5: Managing the Metadata Lifecycle The Future of DDI at GESIS and ICPSR Peter Granda, ICPSR Meinhard Moschner, GESIS Mary Vardigan, ICPSR Joachim Wackerow,

DDI 3 Use at GESIS

• Structured Comments – Processing• Translation of EVS Questionnaire – Collection• Supporting Enhanced Publications – Analysis• Continuity Guides: Trends by Concepts – Concept,

Discovery, Repurposing

Page 6: Managing the Metadata Lifecycle The Future of DDI at GESIS and ICPSR Peter Granda, ICPSR Meinhard Moschner, GESIS Mary Vardigan, ICPSR Joachim Wackerow,

Extracting structured information in current workflow

• Example: building derived variables by SPSS• SPSS setups contain commands and comments• Necessary steps for using SPSS setups as information

source for DDI– Improving comments for automated extraction

• formalize layout

• add keywords from a list

– Extraction of structured comments and related commands by custom tool.

– Transformation of this information into DDI 3 fragments

Page 7: Managing the Metadata Lifecycle The Future of DDI at GESIS and ICPSR Peter Granda, ICPSR Meinhard Moschner, GESIS Mary Vardigan, ICPSR Joachim Wackerow,

***v* Variables/DerivedVariables * DESCRIPTION * This section is on derived variables;***.

***v* DerivedVariables/w101_new * NAME * w101_new * DESCRIPTION * w101_new is a derived variable from w101; * It has the original value from w101 * when w102 is equal 1 * otherwise it has the value 5; * USED VARIABLES * w101, w102 * SOURCE**.

compute w101_new = 5 .if ( w102 = 1 ) w101_new = w101 .

** * VERSION * 2009-04-18 * AUTHOR * Achim Wackerow * EMAIL * [email protected]***.

SPSS

Result

Extractor

Report (HTML)

DDI 3 fragmentsGenerationInstruction

DescriptionCommand

Extracting structured information in current workflow

Page 8: Managing the Metadata Lifecycle The Future of DDI at GESIS and ICPSR Peter Granda, ICPSR Meinhard Moschner, GESIS Mary Vardigan, ICPSR Joachim Wackerow,

Translation of EVS Questionnaire

DSDM

http://zacat.gesis.org

Page 9: Managing the Metadata Lifecycle The Future of DDI at GESIS and ICPSR Peter Granda, ICPSR Meinhard Moschner, GESIS Mary Vardigan, ICPSR Joachim Wackerow,

Publications with References to Data:DDI 3.1 URN contains:Agency ObjectVersion

URL ofDocumentation and/or Data

URL ofDocumentation and/or Data

DDI Alliance

find agency gesis.de.ddi

return resolver address

find object

return URL

http://resolve.gesis.org

http://www.gesis.org/doc/docxyzrequest documentreturn document

Publication with References (URNs)

Publication with References (URNs)

<urn:ddi:3_1:VariableScheme.Variable=gesis.de.ddi:ZA3811_VarSch(1_0).V8(1_0)>

Supporting Enhanced Publications

Page 10: Managing the Metadata Lifecycle The Future of DDI at GESIS and ICPSR Peter Granda, ICPSR Meinhard Moschner, GESIS Mary Vardigan, ICPSR Joachim Wackerow,

Supporting Enhanced Publications

DSDM DDI 3 EPE Simple Export Wizard 1.2.0

Page 11: Managing the Metadata Lifecycle The Future of DDI at GESIS and ICPSR Peter Granda, ICPSR Meinhard Moschner, GESIS Mary Vardigan, ICPSR Joachim Wackerow,

Grouping Trends

• Continuity guides in different contexts– Synoptical question / variable lists– Documentation of changes in question wording / answer scales

• Systematic organization by conceptual categories – CodebookExlorer tool (relational DB)– Publication as html links on variable level in ZACAT

• Taking advantage of DDI3 in the future– Defining the standard and comparison – Qualifying relations (e.g. q-text modified, scale modified,…)

Page 12: Managing the Metadata Lifecycle The Future of DDI at GESIS and ICPSR Peter Granda, ICPSR Meinhard Moschner, GESIS Mary Vardigan, ICPSR Joachim Wackerow,

Continuity guides

Literal question text over time

Conceptual categories

Deviations in answer categories

Page 13: Managing the Metadata Lifecycle The Future of DDI at GESIS and ICPSR Peter Granda, ICPSR Meinhard Moschner, GESIS Mary Vardigan, ICPSR Joachim Wackerow,

Trends by concepts

Conceptual categories

Trend variables by study

Country 1 Country 2

Page 14: Managing the Metadata Lifecycle The Future of DDI at GESIS and ICPSR Peter Granda, ICPSR Meinhard Moschner, GESIS Mary Vardigan, ICPSR Joachim Wackerow,

STUDY UNIT 1 … n DataCollection

<dc:QuestionScheme id="QS"><dc:QuestionItem id="Qn">… <dc:Text>Have you …?</dc:Text>

… LogicalProduct

<l:CategoryScheme id="CATS1"> <l:Category id="Cat1"> <r:Label>often</r:Label>…<l:CodeScheme id="CODS1">…<l:Code isDiscrete="true"> <l:CategoryReference> <r:ID>Cat1</r:ID> </l:CategoryReference> <l:Value>4</l:Value> </l:Code> …

GROUPSTUDY UNIT 8-14

DataCollection…

LogicalProduct…

Comparison map

Equivalency Relationship Description

DDI3 RESOURCE „Ex-post Standard“

Universe Concept

Data Collection

<dc:QuestionScheme id="QS"><dc:QuestionItem id="Q"> <dc:QuestionText> <dc:LiteralText> <dc:Text>Do you …?</dc:Text> </dc:LiteralText> …<dc:CodeDomain> <r:CodeSchemeReference> <r:ID>CODS1</r:ID> </r:CodeSchemeReference>

Logical Product

<l:CategoryScheme id="CATS1"> <l:Category id="Cat1"> <r:Label>often</r:Label>…

<l:CodeScheme id="CODS1"> <l:CategorySchemeReference> <r:ID>CATS1</r:ID> </l:CategorySchemeReference> <l:Code isDiscrete="true"> <l:CategoryReference> <r:ID>Cat1</r:ID> </l:CategoryReference> <l:Value>1</l:Value> </l:Code> …

Questiontext <>modified<>

Values<>different>>

<>generation instruction<><>scale reversed<>

Label<>identical<>

GROUPSTUDY UNIT 15-x

DataCollection…

LogicalProduct…

Page 15: Managing the Metadata Lifecycle The Future of DDI at GESIS and ICPSR Peter Granda, ICPSR Meinhard Moschner, GESIS Mary Vardigan, ICPSR Joachim Wackerow,

DDI 3 Use at ICPSR

• Information collected from data producers in pre-collection phase – Concept

• Metadata output from CAI applications – Data Collection• Processor‘s dashboard – Data Processing• Metadata mining: New faceted search tool to facilitate

discovery through more precise searching – Data Discovery

• Relational database for comparison and harmonization across studies – Repurposing

Page 16: Managing the Metadata Lifecycle The Future of DDI at GESIS and ICPSR Peter Granda, ICPSR Meinhard Moschner, GESIS Mary Vardigan, ICPSR Joachim Wackerow,

SMDS Metadata Modules

Page 17: Managing the Metadata Lifecycle The Future of DDI at GESIS and ICPSR Peter Granda, ICPSR Meinhard Moschner, GESIS Mary Vardigan, ICPSR Joachim Wackerow,
Page 18: Managing the Metadata Lifecycle The Future of DDI at GESIS and ICPSR Peter Granda, ICPSR Meinhard Moschner, GESIS Mary Vardigan, ICPSR Joachim Wackerow,
Page 19: Managing the Metadata Lifecycle The Future of DDI at GESIS and ICPSR Peter Granda, ICPSR Meinhard Moschner, GESIS Mary Vardigan, ICPSR Joachim Wackerow,
Page 20: Managing the Metadata Lifecycle The Future of DDI at GESIS and ICPSR Peter Granda, ICPSR Meinhard Moschner, GESIS Mary Vardigan, ICPSR Joachim Wackerow,
Page 21: Managing the Metadata Lifecycle The Future of DDI at GESIS and ICPSR Peter Granda, ICPSR Meinhard Moschner, GESIS Mary Vardigan, ICPSR Joachim Wackerow,
Page 22: Managing the Metadata Lifecycle The Future of DDI at GESIS and ICPSR Peter Granda, ICPSR Meinhard Moschner, GESIS Mary Vardigan, ICPSR Joachim Wackerow,

DDI as backbone for structured metadata

CollectionConceptProcessin

g

Distribution

Discovery Analysis

SIP

AIP

DIP

CAI ToolsMQDS etc.

Information extracted from SPSS etc.

Archive

Custom Tools(e.g. Forms-based)

Statistical packagesOnline Analysis.

Search engines.Distribution Packages

Web information system

A combination of this information forms a traditional SIP. Information from each life cycle stage - sent to the archive - can be understood as dynamic SIP. Self-archiving by web forms can be offered for the different stages.

The structured metadata combined with data forms the core of the archive. It would be organised in a way where metadata can be reused and information can be ingested and distributed in a dynamic way.

Data / Documents outside of DDI

An AIP must be specially built, because the metadata can include just references to other reused metadata. An AIP should include everything of one study, DDI can be also the main structure of the AIP. Data can be inline in DDI. An AIP would exist beside the core structure in the archive. An easy roundtrip should be possible between the core structure and the AIP. The purpose of the AIP is comparable to PDF/A where all fonts are included. The core structure is headed to efficient processing and reuse of metadata.

Page 23: Managing the Metadata Lifecycle The Future of DDI at GESIS and ICPSR Peter Granda, ICPSR Meinhard Moschner, GESIS Mary Vardigan, ICPSR Joachim Wackerow,

DDI-based archive as collection of reusable components• Metadata in DDI is structured in small items which can be identified and

maintained by one or more institutions

• These parts can be

– the basis for comparison and metadata mining (discovery of new relationships)

– a candidate for reuse in other studies or new studies (like standard questions or variables)

Study 1

Study-specific information

Items for reuse

Study 1

Study-specific information

Items for reuse

New study

Repository ofreusable components Standard concepts Standard questions Standard variables Harmonized information Controlled vocabularies

Page 24: Managing the Metadata Lifecycle The Future of DDI at GESIS and ICPSR Peter Granda, ICPSR Meinhard Moschner, GESIS Mary Vardigan, ICPSR Joachim Wackerow,

Issues for Discussion

• Advantages and disadvantages of seeking to capture additional metadata throughout the data life cycle

• How much information to make available to funding agencies, data producers, and secondary users?

• Rules for structured documentation and delivery of items to archives for preservation

• An overall DDI tool to capture and curate all metadata and data – the Holy Grail???