system update 2010 crossref workshops chuck koscher

26
1 CrossRef 2010 Annual Member Meeting - London Page 1 CrossRef Annual Meeting – London Workshops 15 November 2010

Upload: crossref

Post on 26-Jun-2015

1.032 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: System Update 2010 CrossRef Workshops Chuck Koscher

1

CrossRef 2010 Annual Member Meeting - LondonPage 1

CrossRef Annual Meeting – LondonWorkshops

15 November 2010

Page 2: System Update 2010 CrossRef Workshops Chuck Koscher

2

CrossRef 2010 Annual Member Meeting - LondonPage 2

Workshops Agenda9:30-10:00 Coffee & Tea 10:00-11:30 System Update ….. Andrew Gilmartin, Senior Software Developer

Chuck Koscher, Director of Technology 11:30-12:00 CrossMark …………Geoff Bilder, Director of Strategic Initiatives

12:00-12:30 CrossCheck ………. Kirsty Meddings, Product Manager

12:30-1:15 Lunch

1:15-2:15 Metadata Quality …. Patricia Feeney, Product Support Manager 2:15-2:45 Cited-by Linking ……Carol Anne Meyer, Business Development and Marketing Manager Chuck Koscher

2:45-3:00 Break

3:00-4:00 DOI Workflow Issues, Working with Vendors ……. Carol Anne Meyer

4:00-4:45 Boot Camp …………Carol Anne Meyer Tim Pickard, System Support Analyst/Administrator

4:45-5:15 Books ……………….Carol Anne Meyer,

Page 3: System Update 2010 CrossRef Workshops Chuck Koscher

3

CrossRef 2010 Annual Member Meeting - LondonPage 3

System Update

System status

Rewrite review

Rewrite implementation

Discussion

Page 4: System Update 2010 CrossRef Workshops Chuck Koscher

4

CrossRef 2010 Annual Member Meeting - LondonPage 4

System status

Page 5: System Update 2010 CrossRef Workshops Chuck Koscher

5

CrossRef 2010 Annual Member Meeting - LondonPage 5

System status

Page 6: System Update 2010 CrossRef Workshops Chuck Koscher

6

CrossRef 2010 Annual Member Meeting - LondonPage 6

Page 7: System Update 2010 CrossRef Workshops Chuck Koscher

7

CrossRef 2010 Annual Member Meeting - LondonPage 7

Page 8: System Update 2010 CrossRef Workshops Chuck Koscher

8

CrossRef 2010 Annual Member Meeting - LondonPage 8

Old system

New Q system

The switch

Page 9: System Update 2010 CrossRef Workshops Chuck Koscher

9

CrossRef 2010 Annual Member Meeting - LondonPage 9

System status

Deposit processing

Suspended for 2+ weekends for Oracle DB upgrade (to 11g) Processing times remain the same. (50% under 5 min, 30% more under 1 hour) Large re-deposits (Elsevier plans for 2011) Schema relatively unchanged in 2+ years (keep adding MIME types)

Deposit focus areas for 2011 (other than the re-write)

Investigating a PDF upload option (for depositing a DOI and the article’s references) Modify WebDeposit to allow users to edit an existing DOI’s metadata Maintenance on NLM DTD deposit tool

Page 10: System Update 2010 CrossRef Workshops Chuck Koscher

10

CrossRef 2010 Annual Member Meeting - LondonPage 10

Page 11: System Update 2010 CrossRef Workshops Chuck Koscher

11

CrossRef 2010 Annual Member Meeting - LondonPage 11

Page 12: System Update 2010 CrossRef Workshops Chuck Koscher

12

CrossRef 2010 Annual Member Meeting - LondonPage 12

System rewrite

The Query System (QS), where are we?

Its taking longer than we thought. QS is 99% ready, periodically in service since starting mid Sept. Last vexing problem solved (database connection dead-lock)? Performance improvement is very encouraging. Metrics and measurement capability greatly improved.

The Deposit System (DS), where are we?

Initial design discussions have been held, documentation is under way. Implementation to start in January Development will take until mid year, then lots of testing Data clean up will be part of the migration process (mainly titles)

Page 13: System Update 2010 CrossRef Workshops Chuck Koscher

13

CrossRef 2010 Annual Member Meeting - LondonPage 13

⋅ Modularity of design

⋅ Utility of APIs where possible

⋅ Data stores that enable XML capabilities

⋅ Minimize dependency on proprietary systems

•That CrossRef should ultimately own the intellectual property in the software at the heart of its operations

• That CrossRef should not risk or jeopardize the reliability and throughput offered by the existing system

• That CrossRef should remain free to develop further applications for other purposes which need to interface to the reference-linking systems and/or its data

System rewrite

Rewrite 2 Working Group – Final report November 2008

Page 14: System Update 2010 CrossRef Workshops Chuck Koscher

14

CrossRef 2010 Annual Member Meeting - LondonPage 14

O Unit testing (regression testing)

O Scriptable data ingestion work flow

F Richer metadata querying capability

F Integrated data harvesting capabilities

F Dealing with references using other character sets

F Crawling of content to ingest it Vs. making deposits

F Depositing of non journal content

F Matching unstructured references using full text of equiv

F Querying of non journal content

F Real time, cited-by queries - with data-driven APIs

F More content types, including language variants

F More granular typing of journal articles

F Improved reporting facilities

F More useful user interface for members

System rewrite Rewrite 2 Working Group – Final report November 2008

A Solve NFS issue

A Federate architecture

A Database redesign

A Redesign event notification model (replace email)

O Improved title management and control

O Better publisher/member management model

O Daily testing/monitoring (data integrity)

O Built in health and status monitoring

O Performance improvements and queue management

Now Soon Later

Page 15: System Update 2010 CrossRef Workshops Chuck Koscher

15

CrossRef 2010 Annual Member Meeting - LondonPage 15

System rewrite

Technical Objectives

Rework a 9 year old system Address a declining performance situation Improve administrative aspects (better control and reporting) Facilitate extensibility Staff’s better able to respond due to operational insight

Business Objectives Develop internal capabilities ($ for every change Atypon makes) Secure an independent path (continuity) Benefit of being on a ‘shared’ platform nearing zero Maintain access to technical expertise

Page 16: System Update 2010 CrossRef Workshops Chuck Koscher

16

CrossRef 2010 Annual Member Meeting - LondonPage 16

Late 2010 thru mid 2011

HAProxy

HTTP Traffic

MySQLLucene BerkelyDB

FrontEnd QS(Spring)(Tomcat)

Deposit System(old Atypon EDS)

BackEnd ServicesActive MQ(messaging)

Oracle(prime)

Oracle(active-stndby) Constant

Replication

Oracle Group

New System

External messaging(email, etc)

System rewrite

Page 17: System Update 2010 CrossRef Workshops Chuck Koscher

17

CrossRef 2010 Annual Member Meeting - LondonPage 17

Q3 2011

HAProxy

HTTP Traffic

MySQLLucene BerkelyDB

FrontEnd QS(Spring)(Tomcat)

BackEnd ServicesActive MQ(messaging)

Oracle(prime)

Oracle(active-stndby) Constant

Replication

Oracle Group

New System

External messaging(email, etc)

Deposit ProcessingFrontEnd DS (Spring) (Tomcat)

• File Upload• Deposit reports

System rewrite

Page 18: System Update 2010 CrossRef Workshops Chuck Koscher

18

CrossRef 2010 Annual Member Meeting - LondonPage 18

Deposit DB(prime)

Oracle Group

System rewrite

Deposit DB(standby)

OracleReplication

Query DB(prime)

Query DB(secondary)

OracleReplication

New Deposit SystemDatabaseUpdater

Primary Datacenter

Deposit DB(prime)

Query DB(prime)

Recovery Datacenter

Page 19: System Update 2010 CrossRef Workshops Chuck Koscher

19

CrossRef 2010 Annual Member Meeting - LondonPage 19

Query system feature changes

Tweaks to the matching logic (discoveries made porting the code)

Fixed some nagging characteristics

Aggregate email notices for alerts

Implement HTTP free-text matching (still needs work, ‘alpha’)

Process free-text references for cited-by (done, stable, uses refXpress)

Establish better user model:

1. Username & passwords for members (Query and deposit)

2. Registered email address of non members (Query only)

System rewrite

UseRegistration

Form

ReceiveEmail

UseValidation

Form

Page 20: System Update 2010 CrossRef Workshops Chuck Koscher

20

CrossRef 2010 Annual Member Meeting - LondonPage 20

Page 21: System Update 2010 CrossRef Workshops Chuck Koscher

21

CrossRef 2010 Annual Member Meeting - LondonPage 21

System rewriteSimple Text Query

Page 22: System Update 2010 CrossRef Workshops Chuck Koscher

22

CrossRef 2010 Annual Member Meeting - LondonPage 22

Uses refXpress to break free-text into XML suitable for running a metadata query

Page 23: System Update 2010 CrossRef Workshops Chuck Koscher

23

CrossRef 2010 Annual Member Meeting - LondonPage 23

Uses QS Formatted Citation Parse to break free-text into XML suitable for running a metadata query, if that fails uses QS Formatted Citation Search (with high threshold) to search Lucene index for a DOI.

Page 24: System Update 2010 CrossRef Workshops Chuck Koscher

24

CrossRef 2010 Annual Member Meeting - LondonPage 24

But be careful !<citation key="b53_366"> <unstructured_citation> 53. O.S. Gudmundsson, S.D.S. Jois, D.G. Vander Velde, T.J. Siahaan, B. Wang, and R.T. Borchardt (1999 ) The effect of conformation on the membrane permeability of coumarinic acid- and phenylpropionic acid-based cyclic prodrugs of opioid peptides.J. Pept. Res.53 , 383 -392 .</unstructured_citation></citation>

<doi type="journal_article"> 10.1034/j.1399-3011.1999.00076.x</doi> <issn type="print">1397-002X</issn> <issn type="electronic">1399-3011</issn> <journal_title>Journal of Peptide Research</journal_title> <contributors> <contributor sequence="first" contributor_role="author"> <given_name>O.S.</given_name> <surname>Gudmundsson</surname> </contributor> </contributors> <volume>53</volume> <issue>4</issue> <first_page>383</first_page> <last_page>392</last_page> <year media_type="print">1999</year> <publication_type>full_text</publication_type> <article_title>The effect of conformation on the membrane permeation of coumarinic acid- and phenylpropionic acid-based cyclic prodrugs of opioid peptides</article_title>

<doi type="journal_article"> 10.1034/j.1399-3011.1999.00077.x</doi> <issn type="print">1397-002X</issn> <issn type="electronic">1399-3011</issn> <journal_title>Journal of Peptide Research</journal_title> <contributors> <contributor sequence="first" contributor_role="author"> <given_name>O.S.</given_name> <surname>Gudmundsson</surname> </contributor> </contributors> <volume>53</volume> <issue>4</issue> <first_page>403</first_page> <last_page>413</last_page> <year media_type="print">1999</year> <publication_type>full_text</publication_type> <article_title>The effect of conformation of the acyloxyalkoxy-based cyclic prodrugs of opioid peptides on their membrane permeability</article_title>

Still yields this

But the correct answer is this

Page 25: System Update 2010 CrossRef Workshops Chuck Koscher

25

CrossRef 2010 Annual Member Meeting - LondonPage 25

Deposit system feature changes

Parse the XML prior to accepting the upload

Process XML, register DOIs regardless of metadata ingestion problems

Provide aggregated deposit reports (daily?)

Integrate Schematron checks into deposit process

Robust title ownership model, not based on prefix, with shared ownership options

Separate deposit metadata organization from query metadata organization (ex. Allow title substitution

System rewrite

Page 26: System Update 2010 CrossRef Workshops Chuck Koscher

26

CrossRef 2010 Annual Member Meeting - LondonPage 26

Andrew