niso webinar: authority control: are you who we say you are?
TRANSCRIPT
NISO Webinar Authority Control:
Are You Who We Say You Are?
Wednesday, February 11, 2015
Speakers:
Simeon Warner, Director of Repository Development, Cornell University Library
Laura Dawson, Product Manager, ProQuest
Thomas Hickey, Chief Scientist, OCLC
http://www.niso.org/news/events/2015/webinars/authority_control/
ORCID identifiers in research
workflows
Simeon Warner, Cornell University Library
with thanks to
Laure Haak, ORCID Executive Director and
Josh Brown, ORCID Regional Director, Europe
for slides and comments
NISO Webinar:
Authority Control: Are You Who We Say You Are?
February 11, 2015
“Use ORCID iDs in research
workflows to solve name
ambiguity and save everyone
a bunch of effort!”
ORCID background
• open - anyone can register, any organization with interest in
research and scholarly communications can join, iDs intended
for reuse, software open source
• non-profit - incorporated in USA, also ORCID EU
• community-driven - where community includes all sectors of
research process including publishers, funders, universities,
and the researchers themselves
two core functions:
1. a registry of unique identifiers and manage a record of
activities
2. APIs that support system-to-system communication and
authentication
see: http://orcid.org/content/initiative
ORCID status and adoption
A little over 2 years since launch, over 1.1M ids created,
over 190 members from all sectors and around the world.
-
100,000
200,000
300,000
400,000
500,000
600,000
700,000
800,000
900,000
Oct
No
vD
ec Jan
Feb
Mar
Ap
rM
ay Jun
Jul
Au
gSe
pO
ctN
ov
Dec Jan
Feb
Mar
Ap
rM
ayJu
ne
July
Au
g
Creator
Website
Trusted Party
2012 2013 2014
Publishing25%
Universities & Research
Orgs45%
Funders7%
Associations
12%
Repositories & Profile
Sys11%
EMEA35%
Americas
50%
AsiaPac15%
National integrations and membership
http://openaccess.blogg.kb.se/2013/01/30/slutrapport-fran-projekt-forfattarindentifikatorer/
http://www.jisc.ac.uk/whatwedo/programmes/di_researchmanagement/researchinformation/orcid.aspx
http://orcid.org/blog/2014/09/03/denmark-adopts-orcid-consortium-approach-orcid-implementation
http://orcidpilot.jiscinvolve.org/wp/
ORCID Scope
ORCID = Open RESEARCHER AND CONTRIBUTOR Identifier
o Research activities
o Living people
o There are fewer researchers than the scope of people and
personas covered by ISNI or VIAF
CONTRIBUTOR -- ORCID intended to be used for the spectrum of
actors in the research process, not just authors, and records roles.
o Already supports roles like translator, principal investigator
o 2012 Harvard Workshop http://projects.iq.harvard.edu/attribution_workshop/home
o 2014 Project CRediT Workshop http://www.eventbrite.ca/e/project-credit-workshop-tickets-10314211083
Researcher driven
Creation methods:
• integrations dominate
• website second
• institutional creation
Researcher must be involved to create or activate the ORCID iD,
and can control the privacy settings and/or add information.
Recommend institutions use the trusted party creation method
rather then direct record creation. Need to connect with and
educate users anyway. Can pre-populate registration fields.
-
100,000
200,000
300,000
400,000
500,000
600,000
700,000
800,000
900,000
Oct
No
vD
ec Jan
Feb
Mar
Ap
rM
ay Jun
Jul
Au
gSe
pO
ctN
ov
Dec Jan
Feb
Mar
Ap
rM
ayJu
ne
July
Au
g
Creator
Website
Trusted Party
2012 2013 2014
Leveraging ISNI Organization IDs
ORCID uses Ringgold (an ISNI registrar) organization list to support
connection between individuals and education and employment
affiliations.
Leveraging FundRef identifiers
Funding agency list coordinated with FundRef
Auto-complete based
on FundRef data
Integration of ORCID iDs in research
workflows
Publication round trip
ORCID iDs are intended to be integrated into research and
publication workflows, and become embedded in the
metadata. ORCID iDs will thus be associated with new
works at the time of publication.
ORCID
record
Manuscript
SubmissionORCID
record
ORCID
recordReview
Publication
w DOI &
ORCID(s)
CrossRef
DOI assignment
Verified ORCID, update permission
Readers
Round trip process and implications
Publisher captures ORCID iD during manuscript submission
o Authenticated process, no mistyping, accurate
o User may grant permission to add works later
Publisher includes ORCID iD in metadata when minting DOI
o Will be available to support discovery
o Available in CrossRef search
Publisher/CrossRef writes metadata back to ORCID record
o Holder notified, can control visibility
o Saves effort updating record
o Information flow to other systems such as local profile (e.g.
I've linked my ORCID record with my VIVO profile)
Similar process for datasets, mediated by DataCite
ref: http://orcid.org/blog/2014/11/21/new-functionality-friday-auto-update-your-orcid-record
Funder workflow
• Use for applicants and reviewers
• Profile data reduces applicant/grantee form filling burden
• Improve reporting accuracy
• Pull publications, datasets and other works based on ORCID iD
ref: http://support.orcid.org/knowledgebase/articles/426596-orcid-funder-workflow
An ounce of ambiguity avoidance is worth a
pound of disambiguation
-- with apologies to Benjamin Franklin
• Workflow integration avoids name ambiguity at source
• Resulting data good for disambiguation of older data
• Resulting data good for compilation of authority records
“How much information should my
ORCID record have?”
Minimal record
Registration is really quick and
easy, 30 seconds perhaps
1. name
2. email
3. password
4. agree to privacy policy and
conditions
A minimal ORCID record that is
enough to get an iD and use it in
research workflows
Helpful ORCID record
Reasons to add a little more information:
1. Provide enough information so that someone who follows a
link to your record, or searches for you, can understand which
"John Smith" you are
o alternate names
o education and employment information
o a few works. Everyone likes to show off their best work …
o opens the door for disambiguation of existing data
1. Provide other identifiers so that ORCID can act as a
switchboard to connect your identities in different systems.
o local profile id (e.g. my VIVO id at Cornell)
o Scopus Author ID, Researcher ID, ISNI
o (Using the search and link wizards that connect to these
other systems is also the easiest way to add works.)
Expansive ORCID record
There are many import wizards which not only allow
o connection of an ORCID record to other identifiers
o also import of works, grants, etc..
o source is recorded and provides way to assess trust
ORCID registry has facilities for users to enter works themselves,
specify their roles, etc..
ORCID UI groups information about the same work from multiple
sources
o user may select preferred one to display
You may make your ORCID record a complete picture research
contributions if you choose. But a complete record isn't necessary
for ORCID to work.
ORCID as a hub identifier
ORCID is a hub
Other Identifiers
Funders
Higher Education
and Employers
Professional Associations
Repositories
Publishers
The ORCID identifier
connects researchers
with their works
(papers, grants,
datasets, and more),
organizations, and
other identifiers.
ORCID APIs enable data
exchange between
research information
systems.
DOI
DOI
ISBN
Thesis ID
ISNI
Researcher ID
Scopus Author ID
Internal identifiers
Member ID
Abstract ID
Member ID
Abstract ID
FundRef
GrantID
Hub identifier linking to other
identifiers and to profiles in
other systems
… and data in machine form too
$ curl –H “Accept: application/orcid+xml”
“http://pub.orcid.org/0000-0002-7970-7855/orcid-bio”
| grep external-id-url
<external-id-url>
http://isni.org/isni/0000000351311901
</external-id-url>
<external-id-url>
http://vivo.cornell.edu/individual/individual24416
</external-id-url>
<external-id-url>
http://www.researcherid.com/rid/E-2423-2011
</external-id-url>
<external-id-url>
http://www.scopus.com/inward/authorDetails.url?authorID=7103063073&p
artnerID=MN8TOARS
</external-id-url>
Thanks for listening!
Pointers
Register at https://orcid.org/register if you haven’t already!
http://orcid.org/
• Research organizations: http://orcid.org/organizations/institutions
• Publishers: http://orcid.org/organizations/publishers
• Associations: http://orcid.org/organizations/associations
• Funders: http://orcid.org/organizations/funders
• Researchers: http://orcid.org/content/initiative
Membership http://orcid.org/about/membership
• Questions: [email protected]
Blog http://orcid.org/category/newsletter/blog
Slides: http://www.slideshare.net/simeonwarner/orcid-identifiers-in-research-workflows
ISNI
Disambiguating Public Identities
What Is ISNI
• ISO Standard, published in 2012
• International Standard Name Identifier
• Numerical representation of a name
– 16 digits
– Assigned to public figures, contributors of content –
researchers, authors, musicians, actors, publishers,
research institutions – and subjects of that content (if
they are people or institutions).
– Example: 0000 0004 1029 5439
Who is ISNI
• Founding members
– IFRRO (International Federation of Reproduction Rights Organizations)
– CISAC (International Confederation of Authors and Composers Societies)
– SCAPR (Societies’ Council for the Collective Management of Performers’ Rights)
– OCLC
– CENL (Conference of European National Librarians), represented by the British Library and the National Library of France
– ProQuest, represented by Bowker
Members
Quality Team
Board of Directors
ISNI Organizational Structure
Registration Agencies
Ongoing
assignments/
general public
How Does ISNI Registration Work
• Publisher submits names for assignment through a Registration Agency
• RA works with the publisher to ensure the data feed is well-formatted, and sends that feed to the Assignment Agency
• AA assigns as many ISNIs to the names in the feed as it can, using complex algorithms and business rules that evolve with each feed
• AA returns a file of names with ISNIs attached to them
– This may not be the full file of names
– Ambiguous names are held for review by Quality Team
– QT assignments and other exceptions (assignments as a result of improvements to the algorithm) are returned to RA quarterly
– Process is not instant. Assignment may be immediate if the name and other information is unique, but frequently assignments take a week or two.
Stage One
Customer submits data to Registration Agency
Registration Agency sends file to Assignment Agency
Assignment Agency assigns as many ISNIs to the names as it can
Stage Two
Assignment Agency sends assigned file to
Registration Agency
Registration Agency sends assigned file to
Customer
Customer reviews, QAs, ingests
Stage Three
Assignment Agency sends updates on a monthly basis
Registration Agency disperses files to appropriate
Customers
Customers ingest updates
Display
• Only minimal metadata is displayed
• Not meant as a comprehensive profile
• ISNI is a tool for linking data sets, collocation, and
disambiguation
• Enhancements to the record can be made but not
required
Sample Public ISNI Record
Bridge identifier linking disparate data sets
ISNI links
41
Who is using ISNIs?
• Wikipedia/Wikidata
• VIAF
• Access Copyright
• Scholar Universe
• British Library
• JISC
• Musicbrainz
• Macmillan (Digital Science)
• Booknet Canada (piloting)
• Authors Guild (piloting)
• Books in Print ONIX 2.1 extracts (sent to Google, B&N, Chegg and others)
Einstein’s Wikipedia Page
How many names in the ISNI database?
• Over 8,000,000 assigned
• 10,112,931 provisional (awaiting a match from another
data set for corroboration)
• Your author names may well already have ISNIs.
http://www.isni.org/search.
Use Case: Publisher
Use Case: Research Institution
Use Case: University
Use Case: Cross-Domain Linking
Use Case: Cross-Domain Linking
Data Quality
• Based on matching names to existing records in
database (over 17 million names)
• Strict criteria for assigning ISNIs to names
• Quality team oversight (manual edits)
– British Library
– National Library of France
– OCLC
50
Assignment Criteria
• If on the common surname list:
– Birth date
– Death date
– ISBN(s)
– Title(s)
– Co-authors or institutional affiliation
• If not on the common surname list
– Title(s)
– Birth date
– Death date
– Any other distinguishing factors (“is not”)
• If unique
– Immediate assignment
51
ISNI and ORCID
• ORCID numbers are a subset of the numbers in ISNI’s
database
• Working towards alignment, with ultimate goal of single
assignment
• There is ISNI representation on the ORCID Technical
Steering Group, and ORCID representation on the ISNI
Technical Committee
• A researcher may have both an ORCID and an ISNI
52
Do You Have An ISNI?
53
Thomas Hickey
Chief Scientist, OCLC Research
2015 February
NISO Webinar on Authority Control
VIAF Relations
VIAF
Virtual International Authority File
• Grew out of collaboration with national libraries
• Implemented and run by OCLC
• VIAF Council helps oversee it
• ~36 files, mainly from national authority files
• Everything libraries control other than topical subject headings is in scope– Personals, corporates, families
– Jurisdictionals, geographics
– Works, expressions
– Imaginary characters, etc.
56
57
58
59
60
61
Why multiple files?
• Different
– Information collected
• Private vs. public
• Identification vs. comprehensive
– Technologies and systems
• APIs
– Time scales
• Batch vs. interactive creation
• Historical vs. contemporary
– Business models
62
VIAF’s characteristics
• Origins
• What is being identified
• Who creates it
• Range of entities
• Priorities and control
• What can be shared
Library authorities
Entities libraries control
Library staff
Very broad
Libraries
Open
63
Relationship with ISNI
• Both systems run by OCLC– VIAF helped get ISNI started
• Problems– Each absorbs the other’s data
– Feedback loops!
• Who’s in charge?– ISNI now indicates reviewed records
• Relationships treated as though from xA
• Can both merge and split VIAF clusters
Wikipedia & Wikidata
Wikipedia & Wikidata
Wikipedia & Wikidata
Wikipedia & Wikidata
Wikipedia & Wikidata
Relationship with Wikipedia
• VIAF Harvests Wikipedia dumps monthly
• Pages about people that are in VIAF are added
• VIAFbot back loaded links into Wikipedia
– http://en.wikipedia.org/wiki/User:VIAFbot
Relationship with WorldCat
• One of the main uses of VIAF internally at OCLC is controlling names
• Multilingual Bibliographic Structure project
• Generate ‘xR’ authority records
– Works
– Expressions
OCLC Production Services
External OCLC Research Systems
Internal OCLC Research Resources
enhancedWorldCat
Kindred Works
Classify
Identities
FictionFinder
Cookbook Finder
LCSH
FAST
VIAF
GMGPC
Linked Data Entities
WORKSGSAFD
GTT
DDC
LCTGMMeSH
enhancedWorldCat
WORKSxRSandbox
Multi-lingualBib Records
VIAF
FRBRClustering
Unexpected interactions
• Drive towards comprehensiveness– More information about entities
– More entities
• Importing other files
• Keeping up with updates
• Recognizing source of information
• What to trust
• How to leverage limited staff
Thank you
NISO Webinar • February 11, 2015
Questions?All questions will be posted with presenter answers on
the NISO website following the webinar:
http://www.niso.org/news/events/2015/webinars/authority_control/
NISO Webinar
Authority Control:
Are You Who We Say You Are?
Thank you for joining us today.
Please take a moment to fill out the brief online survey.
We look forward to hearing from you!
THANK YOU