managing the metadata lifecycle the future of ddi at gesis and icpsr peter granda, icpsr meinhard...
Post on 21-Dec-2015
223 views
TRANSCRIPT
Managing the Metadata Managing the Metadata Lifecycle Lifecycle
The Future of DDI at GESIS and ICPSRThe Future of DDI at GESIS and ICPSR
Peter Granda, ICPSRPeter Granda, ICPSR
Meinhard Moschner, GESISMeinhard Moschner, GESIS
Mary Vardigan, ICPSRMary Vardigan, ICPSR
Joachim Wackerow, GESISJoachim Wackerow, GESIS
Wolfgang Zenk-Möltgen, GESISWolfgang Zenk-Möltgen, GESIS
Research Data Life Cycle
CollectionConceptProcessin
gDistributi
onDiscovery Analysis
Archiving
Repurposing
Current Uses of DDI
• DDI 2 used for many different purposes by many different archival institutions, e.g., metadata records for data catalogs, export to Web-based information systems such as Nesstar, long-term preservation, and PDF codebooks
• GESIS and ICPSR are developing procedures and systems to extend use of DDI in their institutions
DDI 3 Expands in Scope
• To date use mainly limited to Distribution and Archiving stages of data life cycle
• DDI 3 enables use of new elements and structures to extend markup to other stages of the life cycle - both earlier and later
• Emphasis is on projects and tasks already in process at each institution
DDI 3 Use at GESIS
• Structured Comments – Processing• Translation of EVS Questionnaire – Collection• Supporting Enhanced Publications – Analysis• Continuity Guides: Trends by Concepts – Concept,
Discovery, Repurposing
Extracting structured information in current workflow
• Example: building derived variables by SPSS• SPSS setups contain commands and comments• Necessary steps for using SPSS setups as information
source for DDI– Improving comments for automated extraction
• formalize layout
• add keywords from a list
– Extraction of structured comments and related commands by custom tool.
– Transformation of this information into DDI 3 fragments
***v* Variables/DerivedVariables * DESCRIPTION * This section is on derived variables;***.
***v* DerivedVariables/w101_new * NAME * w101_new * DESCRIPTION * w101_new is a derived variable from w101; * It has the original value from w101 * when w102 is equal 1 * otherwise it has the value 5; * USED VARIABLES * w101, w102 * SOURCE**.
compute w101_new = 5 .if ( w102 = 1 ) w101_new = w101 .
** * VERSION * 2009-04-18 * AUTHOR * Achim Wackerow * EMAIL * [email protected]***.
SPSS
Result
Extractor
Report (HTML)
DDI 3 fragmentsGenerationInstruction
DescriptionCommand
Extracting structured information in current workflow
Translation of EVS Questionnaire
DSDM
http://zacat.gesis.org
Publications with References to Data:DDI 3.1 URN contains:Agency ObjectVersion
URL ofDocumentation and/or Data
URL ofDocumentation and/or Data
DDI Alliance
find agency gesis.de.ddi
return resolver address
find object
return URL
http://resolve.gesis.org
http://www.gesis.org/doc/docxyzrequest documentreturn document
Publication with References (URNs)
Publication with References (URNs)
<urn:ddi:3_1:VariableScheme.Variable=gesis.de.ddi:ZA3811_VarSch(1_0).V8(1_0)>
Supporting Enhanced Publications
Supporting Enhanced Publications
DSDM DDI 3 EPE Simple Export Wizard 1.2.0
Grouping Trends
• Continuity guides in different contexts– Synoptical question / variable lists– Documentation of changes in question wording / answer scales
• Systematic organization by conceptual categories – CodebookExlorer tool (relational DB)– Publication as html links on variable level in ZACAT
• Taking advantage of DDI3 in the future– Defining the standard and comparison – Qualifying relations (e.g. q-text modified, scale modified,…)
Continuity guides
Literal question text over time
Conceptual categories
Deviations in answer categories
Trends by concepts
Conceptual categories
Trend variables by study
Country 1 Country 2
STUDY UNIT 1 … n DataCollection
<dc:QuestionScheme id="QS"><dc:QuestionItem id="Qn">… <dc:Text>Have you …?</dc:Text>
… LogicalProduct
<l:CategoryScheme id="CATS1"> <l:Category id="Cat1"> <r:Label>often</r:Label>…<l:CodeScheme id="CODS1">…<l:Code isDiscrete="true"> <l:CategoryReference> <r:ID>Cat1</r:ID> </l:CategoryReference> <l:Value>4</l:Value> </l:Code> …
GROUPSTUDY UNIT 8-14
DataCollection…
LogicalProduct…
Comparison map
Equivalency Relationship Description
DDI3 RESOURCE „Ex-post Standard“
Universe Concept
Data Collection
<dc:QuestionScheme id="QS"><dc:QuestionItem id="Q"> <dc:QuestionText> <dc:LiteralText> <dc:Text>Do you …?</dc:Text> </dc:LiteralText> …<dc:CodeDomain> <r:CodeSchemeReference> <r:ID>CODS1</r:ID> </r:CodeSchemeReference>
Logical Product
<l:CategoryScheme id="CATS1"> <l:Category id="Cat1"> <r:Label>often</r:Label>…
<l:CodeScheme id="CODS1"> <l:CategorySchemeReference> <r:ID>CATS1</r:ID> </l:CategorySchemeReference> <l:Code isDiscrete="true"> <l:CategoryReference> <r:ID>Cat1</r:ID> </l:CategoryReference> <l:Value>1</l:Value> </l:Code> …
Questiontext <>modified<>
Values<>different>>
<>generation instruction<><>scale reversed<>
Label<>identical<>
GROUPSTUDY UNIT 15-x
DataCollection…
LogicalProduct…
DDI 3 Use at ICPSR
• Information collected from data producers in pre-collection phase – Concept
• Metadata output from CAI applications – Data Collection• Processor‘s dashboard – Data Processing• Metadata mining: New faceted search tool to facilitate
discovery through more precise searching – Data Discovery
• Relational database for comparison and harmonization across studies – Repurposing
SMDS Metadata Modules
DDI as backbone for structured metadata
CollectionConceptProcessin
g
Distribution
Discovery Analysis
SIP
AIP
DIP
CAI ToolsMQDS etc.
Information extracted from SPSS etc.
Archive
Custom Tools(e.g. Forms-based)
Statistical packagesOnline Analysis.
Search engines.Distribution Packages
Web information system
A combination of this information forms a traditional SIP. Information from each life cycle stage - sent to the archive - can be understood as dynamic SIP. Self-archiving by web forms can be offered for the different stages.
The structured metadata combined with data forms the core of the archive. It would be organised in a way where metadata can be reused and information can be ingested and distributed in a dynamic way.
Data / Documents outside of DDI
An AIP must be specially built, because the metadata can include just references to other reused metadata. An AIP should include everything of one study, DDI can be also the main structure of the AIP. Data can be inline in DDI. An AIP would exist beside the core structure in the archive. An easy roundtrip should be possible between the core structure and the AIP. The purpose of the AIP is comparable to PDF/A where all fonts are included. The core structure is headed to efficient processing and reuse of metadata.
DDI-based archive as collection of reusable components• Metadata in DDI is structured in small items which can be identified and
maintained by one or more institutions
• These parts can be
– the basis for comparison and metadata mining (discovery of new relationships)
– a candidate for reuse in other studies or new studies (like standard questions or variables)
Study 1
Study-specific information
Items for reuse
Study 1
Study-specific information
Items for reuse
New study
Repository ofreusable components Standard concepts Standard questions Standard variables Harmonized information Controlled vocabularies
Issues for Discussion
• Advantages and disadvantages of seeking to capture additional metadata throughout the data life cycle
• How much information to make available to funding agencies, data producers, and secondary users?
• Rules for structured documentation and delivery of items to archives for preservation
• An overall DDI tool to capture and curate all metadata and data – the Holy Grail???