continuous integration for xml and rdf data · docker i docker is an open source platform for...

19
Continuous Integration for XML and RDF Data Sandro Cirulli Language Technologist Oxford University Press (OUP) 6 June 2015

Upload: others

Post on 26-Sep-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Continuous Integration for XML and RDF Data · Docker I Docker is an open source platform for deploying distributed applications running inside containers I Docker provides development

Continuous Integrationfor XML and RDF Data

Sandro CirulliLanguage Technologist

Oxford University Press (OUP)

6 June 2015

Page 2: Continuous Integration for XML and RDF Data · Docker I Docker is an open source platform for deploying distributed applications running inside containers I Docker provides development

Table of contents

1. Context

2. Continuous Integration with Jenkins

3. Automatic Deployment with Docker

4. Future Work

Page 3: Continuous Integration for XML and RDF Data · Docker I Docker is an open source platform for deploying distributed applications running inside containers I Docker provides development

Oxford University PressContext

I Oxford University Press (OUP) is a world-renowneddictionary publisher

I OUP launched the Oxford Global Languages (OGL) initiativeto digitize under-represented languages

I Language data is converted into XML and RDF

3/19

Page 4: Continuous Integration for XML and RDF Data · Docker I Docker is an open source platform for deploying distributed applications running inside containers I Docker provides development

Where we started fromChallenges

I OUP dictionary data was originally developed for printproducts

I OUP acquired dictionaries from other publishers in variousformats

I Data conversions were performed by freelancers using variousprogramming languages, tools, and developmentenvironments

I No testing, no code reuse

4/19

Page 5: Continuous Integration for XML and RDF Data · Docker I Docker is an open source platform for deploying distributed applications running inside containers I Docker provides development

Our aim

I Produce lean, machine-interpretable XML and RDF

I Leverage Semantic Web technologies for linking andinference

I Convert tens of language resources in a scalable,maintainable, and cost-effective manner

5/19

Page 6: Continuous Integration for XML and RDF Data · Docker I Docker is an open source platform for deploying distributed applications running inside containers I Docker provides development

Continuous IntegrationWhat it is

I Continuous Integration (CI) is a software developmentpractice where a development team commits their workfrequently and each commit is integrated by an automatedbuild tool detecting integration errors

I CI requires a build server to monitor changes in the code, runtests, build, and notify developers

I We use Jenkins as it is the most popular open-source CIserver

6/19

Page 7: Continuous Integration for XML and RDF Data · Docker I Docker is an open source platform for deploying distributed applications running inside containers I Docker provides development

Continuous IntegrationWorkflow and components

7/19

Page 8: Continuous Integration for XML and RDF Data · Docker I Docker is an open source platform for deploying distributed applications running inside containers I Docker provides development

Continuous IntegrationNightly Builds

I Nightly builds are automated builds scheduled on a nightlybasis

I We currently builds XML and RDF for 7 datasets

I Nightly builds currently take on average 5 hours on amulti-core Linux machine with 132 GB RAM

I Builds are parallelized using 8 cores

8/19

Page 9: Continuous Integration for XML and RDF Data · Docker I Docker is an open source platform for deploying distributed applications running inside containers I Docker provides development

Continuous IntegrationUnit Testing

I XSpec for XSLT code

I RDFUnit for RDF data

I Test results are converted into JUnit reports via XSLT

I Unit tests are run shortly after a developer commits the code

9/19

Page 10: Continuous Integration for XML and RDF Data · Docker I Docker is an open source platform for deploying distributed applications running inside containers I Docker provides development

Continuous IntegrationMonitor View

10/19

Page 11: Continuous Integration for XML and RDF Data · Docker I Docker is an open source platform for deploying distributed applications running inside containers I Docker provides development

Continuous IntegrationBenefits of CI

I Code reuse: on average, 70-80% of the code could be reusedfor new XML/RDF conversions

I Code quality: regression bugs are avoided

I Bug fixes: bugs are spotted quickly and fixed more rapidly

I Automation: no manual steps, faster and less error-pronebuild process

I Integration: reduced risks, time, and costs for integrationwith other systems

11/19

Page 12: Continuous Integration for XML and RDF Data · Docker I Docker is an open source platform for deploying distributed applications running inside containers I Docker provides development

Continuous IntegrationJenkins Demo

12/19

Page 13: Continuous Integration for XML and RDF Data · Docker I Docker is an open source platform for deploying distributed applications running inside containers I Docker provides development

Automatic Deployment with DockerDocker

I Docker is an open source platform for deployingdistributed applications running inside containers

I Docker provides development and operational teams with ashared, consistent environment for development, testing,and release

I Docker avoids the classic ’but it worked on my machine’issue

I Docker allows applications and their dependencies to bemoved portably across development and productionenvironments

13/19

Page 14: Continuous Integration for XML and RDF Data · Docker I Docker is an open source platform for deploying distributed applications running inside containers I Docker provides development

Docker Containers

14/19

Page 15: Continuous Integration for XML and RDF Data · Docker I Docker is an open source platform for deploying distributed applications running inside containers I Docker provides development

Automatic Deployment with DockerDockerfile

FROM platform_baseMAINTAINER Sandro Cirulli <[email protected]>

# eXist-DB versionENV EXISTDB_VERSION 2.2

# install existWORKDIR /tmpRUN curl -LO http://downloads.sourceforge.net/exist/

Stable/${EXISTDB_VERSION}/eXist-db-setup-${EXISTDB_VERSION}.jar

ADD exist-setup.cmd /tmp/exist-setup.cmd

# run command line configurationRUN expect -f exist-setup.cmd

15/19

Page 16: Continuous Integration for XML and RDF Data · Docker I Docker is an open source platform for deploying distributed applications running inside containers I Docker provides development

Automatic Deployment with DockerDockerfile (cont.)

RUN rm eXist-db-setup-${EXISTDB_VERSION}.jar exist-setup.cmd

# set persistent volumeVOLUME /data/existdbWORKDIR /opt/exist

# change default port to 8008RUN sed -i ’s/default="8080"/default="8008"/g’ tools/

jetty/etc/jetty.xml

EXPOSE 8008 8443

ENV EXISTDB_HOME /opt/exist

CMD bin/startup.sh16/19

Page 17: Continuous Integration for XML and RDF Data · Docker I Docker is an open source platform for deploying distributed applications running inside containers I Docker provides development

Future Work

I Scalability: cloud instances to run compute-intensiveprocesses, distribute builds across slave machines

I Availability: Circuit Breaker Design Pattern

I Code coverage: lack of code coverage tools for XSLT(XSpec and Cakupan are the best we could find)

I Deployment orchestration: docker-compose to orchestrateDocker containers

17/19

Page 18: Continuous Integration for XML and RDF Data · Docker I Docker is an open source platform for deploying distributed applications running inside containers I Docker provides development

Acknowledgements

The work described here was carried out by a developers team atOUP:

I Khalil Ahmed

I Nick Cross

I Matt Kohl

I and myself

18/19

Page 19: Continuous Integration for XML and RDF Data · Docker I Docker is an open source platform for deploying distributed applications running inside containers I Docker provides development

Thank you for your attention!Any questions?

Slides available at:www.sandrocirulli.net/xml-london-2015

Contact me at:[email protected]