crowdsourcing ddi development: new features from the ced...

41
1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project

Upload: others

Post on 17-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to

1

Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block

Crowdsourcing DDI Development: New Features from the CED2AR Project

Page 2: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to

• Part of the NSF Census Research Network (NCRN) (Grant #1131848)

• Lightweight, DDI driven web application • Enables search, browsing and editing across codebooks • Provides an open API for developers • Live example at demo.ncrn.cornell.edu

What is CED2AR?

2

Page 3: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to

• Emphasis on collaborative editing (small set of users) – Online editor – Versioned and tracked metadata through Git – Tied into external authentication frameworks

EDDI 2014 “Collaborative editing…”

3

Page 4: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to

• Support crowdsourced DDI curation through CED2AR – Accommodating more users – Allow for application specific customization – Create incentives and guidance for users – Abstract technical barriers

Now

4

Page 5: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to

• Initial metadata (DDI) has been created and ingested into a CED²AR instance

• Metadata may be – Incomplete (valid DDI but empty or non-informative fields) – Lacking user feedback (on value or constraints of variables)

• Assumption: – Archivist is not the only specialist on a particular dataset – Users collectively have information that is not initially

included in metadata

Starting point here

5

Page 6: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to

1. User searches through CED2AR or external search engine

2. User discovers data relevant to their query 3. User can choose to contribute structured or

unstructured documentation for datasets – No DDI knowledge required – user documents on fields,

without needing to know how that fits into a particular metadata structure

– May involve creating links (provenance) to other datasets

User Workflow

6

Page 7: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to

1. Search engine optimization enhancements to DDI 2. Exposing community contributions Retaining Users 1. Flexible authentication 2. Easy to use editor 3. Metadata scoring 4. Tracking and identifying community contributions

Attracting Users

7

Page 8: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to

8

Search Engine Optimization • Expanding the interoperability of DDI

Page 9: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to

• Support OpenID and OAuth2 – Currently using Google with OAuth2 – Developing connectors to work with additional providers

• CED2AR handles identity management

Authentication

9

Page 10: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to

10

Editing • Automatic validation, and editor for rich content

Page 11: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to

11

Editing

• Allows for ASCII Math

Page 12: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to

12

Editing • Growing support for additional DDI fields, exposed or

not

Page 13: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to

13

Metadata Scoring • Exposing sparse documentation

Page 14: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to

14

User Contributions

Page 15: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to

• Uses Git, a distributed version control system • Every aspect of the system is configurable

– Scheduled tasks check for changes – Once changes exceed threshold, they are pushed – Pending changes are pushed after a time limit or on demand

Versioning

15

Page 16: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to

Architecture

16

Master Branch (Official version)

User Contributed Branch

Codebook 1.0

Codebook 1.0

1. User gets copy of DDI to edit

Page 17: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to

Architecture

17

Master Branch (Official version)

User Contributed Branch

Codebook 1.0

Codebook 1.0 rev 1

Codebook 1.0 rev 2

Codebook 1.0 rev N

Codebook 1.0

1. User gets copy of DDI to edit

2. Each edit is versioned

Page 18: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to

Architecture

18

Master Branch (Official version)

User Contributed Branch

Codebook 1.0

Codebook 1.0 rev 1

Codebook 1.0 rev 2

Codebook 1.0 rev N

Codebook 1.0

Codebook 1.1

1. User gets copy of DDI to edit 3. Data provider

merges user’s edits back into official DDI

2. Each edit is versioned

Page 19: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to

Architecture

Page 20: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to

Architecture

Web Application

Server

Local Repository

20

Database

Remote Repository

Page 21: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to

Architecture

Server

21

Remote Repository

CED2AR Instance

Page 22: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to

Architecture

22

Remote Repository

CED2AR Instance

CED2AR Instance

CED2AR Instance

CED2AR Instance

Page 23: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to

Architecture

23

Remote Repository

CED2AR Instance

CED2AR Instance

CED2AR Instance

CED2AR Instance

CED2AR Instance (Official)

Page 24: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to

• Our implementation uses Bitbucket • Commit messages describe changes • Users linked by email address • Commit hashes are stored on CED2AR • Remote synchronization is optional

Remote Location

24

Page 25: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to

Remote Location

25

Page 26: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to

26

Tracking Changes

Page 27: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to

Continued Work: Improving merge control

27

Master Branch (Official version)

User Contributed Branch

Codebook 1.0

Codebook 1.0 rev 1

Codebook 1.0 rev 2

Codebook 1.0 rev N

Codebook 1.0

Codebook 1.1

1. User gets copy of DDI to edit 3. Data provider

merges user’s edits back into official DDI

2. Each edit is versioned

Page 28: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to

28

• Workflow as described assumes metadata curator merges information

• Within the limits of a 24-hour day: what’s the likelihood that that process scales?

• Alternate: “wiki” methodology

Continued Work: The uncontrolled merge

Page 29: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to

Architecture (alternate) Master Branch

(Official version)

Codebook 1.0

Codebook 1.0

Wiki Branch (Community version)

Page 30: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to

Architecture (alternate) Master Branch

(Official version)

Codebook 1.0

Codebook 1.0

Wiki Branch (Community version)

1

User Branches 2

Users pull from wiki branch into any instance of CED2AR

Page 31: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to

Architecture (alternate) Master Branch

(Official version)

Codebook 1.0

Codebook 1.0

Wiki Branch (Community version)

1

User Branches 2

Codebook rev 1

Users push back to branch manually

Page 32: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to

Architecture (alternate) Master Branch

(Official version)

Codebook 1.0

Codebook 1.0

Wiki Branch (Community version)

1

User Branches 2

3

Codebook rev 1

New users work off most recent revision by default

Page 33: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to

Architecture (alternate) Master Branch

(Official version)

Codebook 1.0

Codebook 1.0

Wiki Branch (Community version)

1

User Branches 2

3

Codebook rev 1

Codebook rev X

Page 34: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to

Architecture (alternate) Master Branch

(Official version)

Codebook 1.0

Codebook 1.0

Wiki Branch (Community version)

1

User Branches 2

3

Codebook rev 1

Codebook rev X

Page 35: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to

Architecture (alternate) Master Branch

(Official version)

Codebook 1.0

Codebook 1.0

Wiki Branch (Community version)

1

User Branches 2

3

Codebook rev 1

Codebook rev X

User is responsible for merging

Page 36: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to

Architecture (alternate)

36

Master Branch (Official version)

Codebook 1.0

Codebook rev X

Wiki Branch (Crowdsource version)

CED²AR User Interface exposes both

versions (with attribution)

Page 37: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to

• Merging crowd-sourced content back into official documentation

Continued Work: Improving merge control

37

Page 38: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to

Thank you! Questions?

[email protected] ncrn.cornell.edu github.com/ncrncornell 38

Page 39: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to

39

Extra slides

Page 40: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to

• Tagging variables with a controlled vocabulary and a folksonomy

Continued Work: Facilitating Editing

40

Page 41: Crowdsourcing DDI Development: New Features from the CED ...ssc.wisc.edu/naddi2015/webready/NADDI2015_Perry.pdf · AR or external search engine 2. User discovers data relevant to

Ingest Workflow

41