![Page 1: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to](https://reader034.vdocuments.mx/reader034/viewer/2022042318/5f07f7ff7e708231d41fab0b/html5/thumbnails/1.jpg)
1
Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block
Crowdsourcing DDI Development: New Features from the CED2AR Project
![Page 2: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to](https://reader034.vdocuments.mx/reader034/viewer/2022042318/5f07f7ff7e708231d41fab0b/html5/thumbnails/2.jpg)
• Part of the NSF Census Research Network (NCRN) (Grant #1131848)
• Lightweight, DDI driven web application • Enables search, browsing and editing across codebooks • Provides an open API for developers • Live example at demo.ncrn.cornell.edu
What is CED2AR?
2
![Page 3: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to](https://reader034.vdocuments.mx/reader034/viewer/2022042318/5f07f7ff7e708231d41fab0b/html5/thumbnails/3.jpg)
• Emphasis on collaborative editing (small set of users) – Online editor – Versioned and tracked metadata through Git – Tied into external authentication frameworks
EDDI 2014 “Collaborative editing…”
3
![Page 4: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to](https://reader034.vdocuments.mx/reader034/viewer/2022042318/5f07f7ff7e708231d41fab0b/html5/thumbnails/4.jpg)
• Support crowdsourced DDI curation through CED2AR – Accommodating more users – Allow for application specific customization – Create incentives and guidance for users – Abstract technical barriers
Now
4
![Page 5: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to](https://reader034.vdocuments.mx/reader034/viewer/2022042318/5f07f7ff7e708231d41fab0b/html5/thumbnails/5.jpg)
• Initial metadata (DDI) has been created and ingested into a CED²AR instance
• Metadata may be – Incomplete (valid DDI but empty or non-informative fields) – Lacking user feedback (on value or constraints of variables)
• Assumption: – Archivist is not the only specialist on a particular dataset – Users collectively have information that is not initially
included in metadata
Starting point here
5
![Page 6: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to](https://reader034.vdocuments.mx/reader034/viewer/2022042318/5f07f7ff7e708231d41fab0b/html5/thumbnails/6.jpg)
1. User searches through CED2AR or external search engine
2. User discovers data relevant to their query 3. User can choose to contribute structured or
unstructured documentation for datasets – No DDI knowledge required – user documents on fields,
without needing to know how that fits into a particular metadata structure
– May involve creating links (provenance) to other datasets
User Workflow
6
![Page 7: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to](https://reader034.vdocuments.mx/reader034/viewer/2022042318/5f07f7ff7e708231d41fab0b/html5/thumbnails/7.jpg)
1. Search engine optimization enhancements to DDI 2. Exposing community contributions Retaining Users 1. Flexible authentication 2. Easy to use editor 3. Metadata scoring 4. Tracking and identifying community contributions
Attracting Users
7
![Page 8: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to](https://reader034.vdocuments.mx/reader034/viewer/2022042318/5f07f7ff7e708231d41fab0b/html5/thumbnails/8.jpg)
8
Search Engine Optimization • Expanding the interoperability of DDI
![Page 9: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to](https://reader034.vdocuments.mx/reader034/viewer/2022042318/5f07f7ff7e708231d41fab0b/html5/thumbnails/9.jpg)
• Support OpenID and OAuth2 – Currently using Google with OAuth2 – Developing connectors to work with additional providers
• CED2AR handles identity management
Authentication
9
![Page 10: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to](https://reader034.vdocuments.mx/reader034/viewer/2022042318/5f07f7ff7e708231d41fab0b/html5/thumbnails/10.jpg)
10
Editing • Automatic validation, and editor for rich content
![Page 11: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to](https://reader034.vdocuments.mx/reader034/viewer/2022042318/5f07f7ff7e708231d41fab0b/html5/thumbnails/11.jpg)
11
Editing
• Allows for ASCII Math
![Page 12: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to](https://reader034.vdocuments.mx/reader034/viewer/2022042318/5f07f7ff7e708231d41fab0b/html5/thumbnails/12.jpg)
12
Editing • Growing support for additional DDI fields, exposed or
not
![Page 13: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to](https://reader034.vdocuments.mx/reader034/viewer/2022042318/5f07f7ff7e708231d41fab0b/html5/thumbnails/13.jpg)
13
Metadata Scoring • Exposing sparse documentation
![Page 14: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to](https://reader034.vdocuments.mx/reader034/viewer/2022042318/5f07f7ff7e708231d41fab0b/html5/thumbnails/14.jpg)
14
User Contributions
![Page 15: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to](https://reader034.vdocuments.mx/reader034/viewer/2022042318/5f07f7ff7e708231d41fab0b/html5/thumbnails/15.jpg)
• Uses Git, a distributed version control system • Every aspect of the system is configurable
– Scheduled tasks check for changes – Once changes exceed threshold, they are pushed – Pending changes are pushed after a time limit or on demand
Versioning
15
![Page 16: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to](https://reader034.vdocuments.mx/reader034/viewer/2022042318/5f07f7ff7e708231d41fab0b/html5/thumbnails/16.jpg)
Architecture
16
Master Branch (Official version)
User Contributed Branch
Codebook 1.0
Codebook 1.0
1. User gets copy of DDI to edit
![Page 17: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to](https://reader034.vdocuments.mx/reader034/viewer/2022042318/5f07f7ff7e708231d41fab0b/html5/thumbnails/17.jpg)
Architecture
17
Master Branch (Official version)
User Contributed Branch
Codebook 1.0
Codebook 1.0 rev 1
Codebook 1.0 rev 2
Codebook 1.0 rev N
Codebook 1.0
…
1. User gets copy of DDI to edit
2. Each edit is versioned
![Page 18: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to](https://reader034.vdocuments.mx/reader034/viewer/2022042318/5f07f7ff7e708231d41fab0b/html5/thumbnails/18.jpg)
Architecture
18
Master Branch (Official version)
User Contributed Branch
Codebook 1.0
Codebook 1.0 rev 1
Codebook 1.0 rev 2
Codebook 1.0 rev N
Codebook 1.0
Codebook 1.1
…
1. User gets copy of DDI to edit 3. Data provider
merges user’s edits back into official DDI
2. Each edit is versioned
![Page 19: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to](https://reader034.vdocuments.mx/reader034/viewer/2022042318/5f07f7ff7e708231d41fab0b/html5/thumbnails/19.jpg)
Architecture
![Page 20: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to](https://reader034.vdocuments.mx/reader034/viewer/2022042318/5f07f7ff7e708231d41fab0b/html5/thumbnails/20.jpg)
Architecture
Web Application
Server
Local Repository
20
Database
Remote Repository
![Page 21: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to](https://reader034.vdocuments.mx/reader034/viewer/2022042318/5f07f7ff7e708231d41fab0b/html5/thumbnails/21.jpg)
Architecture
Server
21
Remote Repository
CED2AR Instance
![Page 22: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to](https://reader034.vdocuments.mx/reader034/viewer/2022042318/5f07f7ff7e708231d41fab0b/html5/thumbnails/22.jpg)
Architecture
22
Remote Repository
CED2AR Instance
CED2AR Instance
CED2AR Instance
CED2AR Instance
![Page 23: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to](https://reader034.vdocuments.mx/reader034/viewer/2022042318/5f07f7ff7e708231d41fab0b/html5/thumbnails/23.jpg)
Architecture
23
Remote Repository
CED2AR Instance
CED2AR Instance
CED2AR Instance
CED2AR Instance
CED2AR Instance (Official)
![Page 24: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to](https://reader034.vdocuments.mx/reader034/viewer/2022042318/5f07f7ff7e708231d41fab0b/html5/thumbnails/24.jpg)
• Our implementation uses Bitbucket • Commit messages describe changes • Users linked by email address • Commit hashes are stored on CED2AR • Remote synchronization is optional
Remote Location
24
![Page 25: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to](https://reader034.vdocuments.mx/reader034/viewer/2022042318/5f07f7ff7e708231d41fab0b/html5/thumbnails/25.jpg)
Remote Location
25
![Page 26: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to](https://reader034.vdocuments.mx/reader034/viewer/2022042318/5f07f7ff7e708231d41fab0b/html5/thumbnails/26.jpg)
26
Tracking Changes
![Page 27: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to](https://reader034.vdocuments.mx/reader034/viewer/2022042318/5f07f7ff7e708231d41fab0b/html5/thumbnails/27.jpg)
Continued Work: Improving merge control
27
Master Branch (Official version)
User Contributed Branch
Codebook 1.0
Codebook 1.0 rev 1
Codebook 1.0 rev 2
Codebook 1.0 rev N
Codebook 1.0
Codebook 1.1
…
1. User gets copy of DDI to edit 3. Data provider
merges user’s edits back into official DDI
2. Each edit is versioned
![Page 28: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to](https://reader034.vdocuments.mx/reader034/viewer/2022042318/5f07f7ff7e708231d41fab0b/html5/thumbnails/28.jpg)
28
• Workflow as described assumes metadata curator merges information
• Within the limits of a 24-hour day: what’s the likelihood that that process scales?
• Alternate: “wiki” methodology
Continued Work: The uncontrolled merge
![Page 29: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to](https://reader034.vdocuments.mx/reader034/viewer/2022042318/5f07f7ff7e708231d41fab0b/html5/thumbnails/29.jpg)
Architecture (alternate) Master Branch
(Official version)
Codebook 1.0
Codebook 1.0
Wiki Branch (Community version)
![Page 30: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to](https://reader034.vdocuments.mx/reader034/viewer/2022042318/5f07f7ff7e708231d41fab0b/html5/thumbnails/30.jpg)
Architecture (alternate) Master Branch
(Official version)
Codebook 1.0
Codebook 1.0
Wiki Branch (Community version)
1
User Branches 2
Users pull from wiki branch into any instance of CED2AR
![Page 31: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to](https://reader034.vdocuments.mx/reader034/viewer/2022042318/5f07f7ff7e708231d41fab0b/html5/thumbnails/31.jpg)
Architecture (alternate) Master Branch
(Official version)
Codebook 1.0
Codebook 1.0
Wiki Branch (Community version)
1
User Branches 2
Codebook rev 1
Users push back to branch manually
![Page 32: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to](https://reader034.vdocuments.mx/reader034/viewer/2022042318/5f07f7ff7e708231d41fab0b/html5/thumbnails/32.jpg)
Architecture (alternate) Master Branch
(Official version)
Codebook 1.0
Codebook 1.0
Wiki Branch (Community version)
1
User Branches 2
3
Codebook rev 1
New users work off most recent revision by default
![Page 33: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to](https://reader034.vdocuments.mx/reader034/viewer/2022042318/5f07f7ff7e708231d41fab0b/html5/thumbnails/33.jpg)
Architecture (alternate) Master Branch
(Official version)
Codebook 1.0
Codebook 1.0
Wiki Branch (Community version)
1
User Branches 2
3
Codebook rev 1
Codebook rev X
…
![Page 34: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to](https://reader034.vdocuments.mx/reader034/viewer/2022042318/5f07f7ff7e708231d41fab0b/html5/thumbnails/34.jpg)
Architecture (alternate) Master Branch
(Official version)
Codebook 1.0
Codebook 1.0
Wiki Branch (Community version)
1
User Branches 2
3
Codebook rev 1
Codebook rev X
…
![Page 35: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to](https://reader034.vdocuments.mx/reader034/viewer/2022042318/5f07f7ff7e708231d41fab0b/html5/thumbnails/35.jpg)
Architecture (alternate) Master Branch
(Official version)
Codebook 1.0
Codebook 1.0
Wiki Branch (Community version)
1
User Branches 2
3
Codebook rev 1
Codebook rev X
…
User is responsible for merging
![Page 36: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to](https://reader034.vdocuments.mx/reader034/viewer/2022042318/5f07f7ff7e708231d41fab0b/html5/thumbnails/36.jpg)
Architecture (alternate)
36
Master Branch (Official version)
Codebook 1.0
Codebook rev X
Wiki Branch (Crowdsource version)
CED²AR User Interface exposes both
versions (with attribution)
![Page 37: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to](https://reader034.vdocuments.mx/reader034/viewer/2022042318/5f07f7ff7e708231d41fab0b/html5/thumbnails/37.jpg)
• Merging crowd-sourced content back into official documentation
Continued Work: Improving merge control
37
![Page 39: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to](https://reader034.vdocuments.mx/reader034/viewer/2022042318/5f07f7ff7e708231d41fab0b/html5/thumbnails/39.jpg)
39
Extra slides
![Page 40: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to](https://reader034.vdocuments.mx/reader034/viewer/2022042318/5f07f7ff7e708231d41fab0b/html5/thumbnails/40.jpg)
• Tagging variables with a controlled vocabulary and a folksonomy
Continued Work: Facilitating Editing
40
![Page 41: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to](https://reader034.vdocuments.mx/reader034/viewer/2022042318/5f07f7ff7e708231d41fab0b/html5/thumbnails/41.jpg)
Ingest Workflow
41