further information on studies - metadata...
TRANSCRIPT
ISO/IEC JTC1/SC32/WG2 N1952
MRC Data Support Service Standalone RegistryIntroductionThe 'Stand Alone' registry is a distribution of the full ISO11179 compliant MRC DSS Study and Dataset metadata database, packaged for use by individual units for a small number of studies both in internal operations and for the preparation of content sets for the central registry. The package includes a client-side addin for Microsoft Excel that supports the registration of tabular datasets, and functions for the basic registration of SAS and SPSS Data Dictionary Files.
The registryThe registry software comprises of an eXist XML database with extensions to support XSLT 2.0, XQuery scripts, XSL Transforms and supporting XML configuration files. It allows a unit to register variables, datasets and reference documents, and to associate these items with a general record that describes the intent and status of the study to which they relate, and subject matter classification schemes to help users access this content.
The plug-inThe plug-in for Microsoft Excel 2007 allows a user to access search, annotation and registration capabilities from within the spreadsheet software. A user can invoke the tool to define a blank spreadsheet from standard variable types already within the registry or create new variable definitions to document the data collection that has been planned. Furthermore, any existing tabular data that may be imported and manipulated in Excel may be annotated by the creation of new variable definitions or in reference to existing ones already defined in the local registry.
Further compatible tools for bulk processing of SAS and SPSS data dictionary content, and extensions to the registry for particular local requirements can be made available on request.
InstallationSystem requirements
HardwareThe base requirements for the metadata registry are modest for units with less than 10,000 variables: we recommend a Pentium class 2GHz processor with 4Gb RAM and 10Gb of free hard-disk space as being appropriate for a dedicated installation, and the registry will happily run on larger systems as a virtual machine. For over 10,000 variables, we recommend 8Gb RAM as a minimum and Core or Xeon processors.
The installer requires the use of a monitor and keyboard. A 'headless' install script for Linux servers without a monitor is available on request.
Operating SystemThe metadata registry should run on any UNIX or modern Windows system for which Java 1.6 is available. We have used it without problems with Windows XP, Windows 7, Windows Server 2008, OS X 10.5.X and 10.6.X, and Ubuntu 9.X and 10.X
Since the database runs in a Java virtual machine, we have found Linux or OS X make more efficient use of resources than Windows for medium to large datasets.
For access to the content from a remote machine, ports 8080 and 8443 should be open by default: these can be changed
SoftwareThe installer expects the Java Developer Kit to be installed on the workstation or server (JDK) which may be obtained and installed from http://www.oracle.com/technetwork/java/javase/downloads/index.html
The plug-in is for Excel 2007 on Windows operating systems only and may require the installation of .NET programmability support.
For power- and programmer- users, we recommend the oXygen XML editor from http://www.oxygenxml.com/.
Details of prerequisite installation are given in Annexe 1.
InstallationThe registry component and the addin are available from (web address here) as an izpack packaged installer. The file is in the order of 100Mb depending upon the default datasets packaged within it, so it may take a little while to download on a slow connection. If the JDK is already installed and appropriate environment variables configured (see Appendix 1), the installer can be started simply by double clicking on the icon. However, many windows machines lack the environment variables required to automatically invoke Java, in which case, the command
java -jar omr-setup-X.X.Xdev-revXXXX.jar
will start installation.
ISO/IEC JTC1/SC32/WG2 N1952
A default installation may simply be obtained by accepting the license agreement and clicking 'next' until the final screen is reached.
On Windows 7 systems, we have found that installation into the 'Program Files' directory is problematic. Unless you have particular requirements, we recommend accepting the default installation location.
The wizard will ask you to confirm the installation location before you continue.
The next page will offer options for the installation of source code and datasets. We recommend that you choose the default DSS dataset, which provides all of the basic documents required to curate content for the Data Support Service Directory, and share and compare metadata with other units.
Then the installer will ask you to select and confirm the location of the database data directory on your disk. Again, we recommend selecting the default option of C:\omr\registry\webapp\WEB-INF\data
ISO/IEC JTC1/SC32/WG2 N1952
Next the installer will prompt you for an administrator password. The default is oxfordmetadata and should be fine for all normal, unsecured installations within a firewall. If you are planning on a wider or more secure installation then this should be changed. Remember to keep a copy of the password in a secure location.
The installer will take a little time copying the files to the specified location, generating a SSL certificate and loading the default content set into the database.
Finally on a windows based system it will offer you the options to create shortcuts in various locations.
and then the installation will complete with some instructions on getting the database started. If installation fails, please contact the Oxford Metadata Registry Support team.
Starting the metadata registry
WindowsOn a windows system, the metadata registry can be started by selecting the option in the start menu, or by double clicking on a start menu shortcut, or by navigating to C:\omr\registry\bin and starting startup.bat
A console will open and if startup is successful the following text will appear ----------------------------------------------------------------
eXist-db has started on port 8080 8443. Configured contexts:
http://localhost:8443/exist
http://localhost:8443/
----------------------------------------------------------------
OS X and LinuxOn a macintosh, open a terminal session and navigate to /Applications/omr/registry/bin and type the command
./startup.sh
ISO/IEC JTC1/SC32/WG2 N1952
Once started you can get to the homepage by following the url
http://localhost:8080/exist/mdr/web/homepage.xql
Stopping the metadata registryThe metadata registry can be stopped by selecting the command/terminal window and typing <ctrl><c> then typing <y> in response to the confirmation message.
Setting up eXist as a serviceOn a windows machine, the omr can be set to start as a service when the server boots simply by selecting the 'install eXist as a service' - this will require administrator privileges so you may need to right click on the shortcut and select 'run as administrator'. For Linux and OS X you will need to link exist.sh to an entry in the init.d directory
ln -s $EXIST_HOME/tools/wrapper/bin/exist.sh /etc/init.d/exist
Please refer to your distribution's documentation: if you are not comfortable working with the command line, then you might want to talk to your systems administrator. Further details about starting, stopping and configuring eXist can be obtained from
http://exist.sourceforge.net/documentation.html
An overview of MDR functionalityThe metadata registry is a database for recording, organising and administrating metadata - in the context of population studies, it provides data dictionary capabilities for one or more closely related studies to facilitate data management and data discovery. The metadata registry implements a number of international standards to facilitate the exchange and use of content that is created within it from both the International Standards Organisation (ISO) and the World Wide Web Consortium (W3C). Support for the emerging version 3 of the Data Documentation Initiative (DDI) is under development. Most items in the metadata registry are 'administered' and share common facilities for naming, definition and change management - these facilities implement ISO11179-3 edition 2.
Types of contentThe metadata registry currently recognises three types of primary content: Study definitions; dataset definitions and variable definitions. Study Definitions provide a largely flat, fixed record of aspects of the study including names and acronyms, organisations responsible for participation and organisation, an overview of the funding provided, some description of the cohort and those people who provide the primary point of contact. At the time of writing there was no accepted standard for metadata about a population study, so the record structure is based upon several existing relevant study registries and the overall intent of standards such as CONSORT which aim to provide a specification for a report of cohort studies. The Study metadata items are listed in Appendix 2.
ISO/IEC JTC1/SC32/WG2 N1952
Dataset Definitions allow the recording of data dictionaries for existing datasets: the metadata registry ships with support for the automatic registration of SPSS and SAS metadata records – extension to describe relational data sources or XML files of other formats can be developed on request. Dataset definitions accord to the general principles of the ISO19763 family of standards.
ISO/IEC JTC1/SC32/WG2 N1952
Variable Definitions document variables contained within datasets, or support the declaration of standard, reusable definitions that are to be conserved across the duration of an experiment. Variable definitions implement ISO11179-3.
ISO/IEC JTC1/SC32/WG2 N1952
Searching for contentContent can be located in the metadata registry by type and phrase. The search functionality is accessed directly from the main menu. The phrase may be any lexical string using standard wildcards and logical operators in capital letters. Example strings include: child; child AND carer; child OR carer;child*; child* AND carer.
A query is composed of terms and operators. Terms may be single terms or phrases – a phrase is a number of terms surrounded by quotation marks. A document only matches a phrase if the exact text within the quotation marks is present. Terms and phrases may be concatenated with the Boolean operators AND, OR and NOT. Single and multiple character wildcards may be included in terms: ‘?’ matches any single character; ‘*’ matches zero or more characters. Both may be used within or at the end of a term: te?t; test*; and te*t are all valid. However, wildcards cannot be used at the beginning of a term or within a quoted phrase. Thus neither ?est nor “smok* habit” are supported. Special characters can be escaped with the backslash ‘\’ character: today\? will search for the word ‘today’ followed by a question mark, rather than ‘today’ followed by any character.
Where more than five matches are returned for any search, the results are paged: navigation through the pages can be achieved through the ‘start’, ‘previous’, ‘next’ and ‘last’ links at the bottom of the page.
ISO/IEC JTC1/SC32/WG2 N1952
Browsing content alphabeticallyFor smaller metadata sets, and for ones where dataset and variable names are controlled and structured, content can be located alphabetically. A strip of buttons facilitates alphabetic access: click on a button to filter the content by that initial letter. Where there are more than five items sharing the same initial letter, paging links will become active at the bottom of the page as with the
search web pages: navigation through the set of items can be achieved by clicking on the appropriate link.
ISO/IEC JTC1/SC32/WG2 N1952
Browsing content by classification schemeA powerful way of accessing content is through classification schemes. A classification scheme is a taxonomy or hierarchy of concepts or terms that may be associated with variables and datasets. The hierarchy can be navigated and used to filter content according to the associations. Supported classification schemes are a subset of ISO11179 conforming to the W3C Simple Knowledge Organisation System (SKOS) – however the user interface makes the common assumption that broader and narrower relationships are transitive so that a taxonomy may be displayed this may result in incompatibilities with some complex third party concept schemes.
To navigate through a hierarchy, select a scheme from the ‘schemes’ drop down list and then click on the first selected term to bring it into focus. The list of variables on the left hand side has now been filtered according to selected term so that all variables associated with that item, and any items associated with terms that are narrower in meaning to the selected term are displayed. Thus selecting the top concept in a scheme will show all of the variables classified within that scheme.
You can bring any other visible term listed as broader, narrower or related into focus simply by clicking on it. Items are returned in alphabetic order and where multiple pages of items are found, navigation through the pages can be accomplished using the links at the bottom of the page as with the search and alphabetic list web pages. Further restriction on the result set may be achieved by entering search terms or phrases into the ‘search within classification’ text box followed by the return key. Clicking on the ‘reset form’ link will reset the whole form and return to the default state with no classification scheme selected and no filter or paging restriction applied.
Creating a registration authority
Creating an organisation
Creating a context
Creating a study record
Registering a dataset through the web interface
Registering an excel dataset
Creating a variable
Administration of the OMRManaging users
Backup and Restore
Indexing
Securing content
Appendix 1 Prerequisite Software InstallationJDK installation on WindowsThe OMR requires Sun’s Java Developer Kit (JDK) 6. You can download the latest version at
http://www.oracle.com/technetwork/java/javase/downloads/index.html
Once you have downloaded the installer, double-click on it to install. You should make a note of the JDK installation path, which for windows will typically be something like...
C:\Program Files\Java\jdk1.6.0_20
...for build 20 of version 1.6.0. You might note the confusion over version numbering.
Set the JAVA_HOME Environmental VariableHaving installed the JDK, you should set the JAVA_HOME system variable to point to the JDK directory you made a note of during installation. If you navigate to the directory through windows explorer, you can copy the path from the address bar.
To set the environment variable:
1. Right-click on My Computer and select Properties from the shortcut menu.
2. In the System Properties dialog box, click the Advanced tab.3. On the Advanced tab, click the Environment Variables button.4. In the System Variables list at the bottom of the Environment
Variables dialog box, look for a JAVA_HOME environment variable in the list.
5. If a JAVA_HOME variable exists, check to see if it matches the JDK installation directory you noted above: a. If it does not match, click Edit and type/paste the JDK directory
path into the Variable value field and click OK twice to accept the changes.
b. If it matches, click Cancel twice to dismiss the dialog boxes.6. If a JAVA_HOME variable does not appear in the list, do the
following:c. Click New.d. In the Variable name field, enter JAVA_HOME.e. In the Variable value field, type or paste the full path to the
JDK directory.f. Click OK.g. Click OK to close the Environment Variables dialog box.
ISO/IEC JTC1/SC32/WG2 N1952
h. Click OK to exit the System Properties dialog box.
The JAVA_HOME variable is now set and the OMR installation will be able to find the JDK during installation.
.NET Programmability Support installation
.NET Programmability Support is required to install the Excel addin. To enable.NET Programmability:
7. Open the Windows Control Panel and select Add/Remove Programs (XP) or Programs and Features (Win 7).
8. Find and click on Microsoft Office Professional 20079. From the list of programs, select Microsoft Office
Professional/Enterprise/Ultimate 2007.10.Click Change.11.Select Add or Remove Features and click Continue.12.Click the plus sign (+) to the left of Microsoft Office Excel to show
the available options.13.Make sure that .NET Programmability Support appears in the list
and is available. This is indicated by the box being shown in white instead of gray or with a red X through the disk icon within it.
14.If it is not enabled, click the drop-down arrow in the box located to the left of the .NET Programmability Support option and select Run from My Computer. It may prompt you for your MS Office CD to complete this task.
Once .NET Programmability Support is verified or enabled, you will be able to install the Excel Query Service Addin.
ISO/IEC JTC1/SC32/WG2 N1952
Appendix 2: Study metadata recordThe structure of study information can be described in an object model: Error: Reference source not found shows an overview of the latest implementation. A Study object is a kind of Administered Item – one whose edits are tracked and recorded over time – and has basic information such as a name, alternative and previous names, and a set of typed identifiers. A Study has a collection of Links, Identifiers and Support details, and a set of Resources which themselves are Administered Items.
A Study has a number of role-based relationships with people, or Contacts – those designated as investigators, researchers, administrators, and initial points for communication, for example – represented by the SC_Role association class. Similar role-based relationships exist between studies and organisations (SO_Role), and between contacts and organisations (CO_Role).
Table 1 lists the primary attributes of the Study class: all have ‘public’ visibility, and would be displayed in the full view of a study record. The type of each field is given; those denoted as being of type String may be subject to further evolution – in particular, introducing controlled vocabularies once a candidate set
has been proposed. The cardinality of each attribute is also given: in this case, we use the standard UML syntax, where “1” denotes a mandatory field, “0..1” denotes an optional field, “0..*” represents a possibly empty set of values, and “1..*” represents a non-empty set of values.
Field Name Type Cardinality
Description
Name String 1 The preferred name of the study, chosen by the study owner
Other names String 0..* A set of alternative and previous names, including acronyms
Organisations
Organisation
1..* The main organisations involved: those coordinating, participating, and those to contact initially for more details
Accountable People
Contact 0..* Including Principal Investigators, Directors and Leads of projects
Support Support 0..* The credited sources of support for the study, with links to further details
Approvals Approval 0..* A list of existing external approvals, other than local ethics
Research Areas
String 1 The main clinical or social areas studied
Description String 1 A brief textual description of the study purpose, proposal or activity
Population String 1 A description of study population, including details of gender, ethnicity, age range, etc. where appropriate
Data Collected
String 1 A broad categorisation of the data collected
Data Sources
Resource 0..* Copies of, or links to, questionnaires, interviews, existing data resources. Individual resources may have restricted access.
Sample Size String 1 An indication of the initial recruitment or cohort size
Status String 1 Whether the study is in preparation, is collecting data, or has completed
ISO/IEC JTC1/SC32/WG2 N1952
Field Name Type Cardinality
Description
Recruitment String 1 Whether the study is currently recruiting participants, and details
Geography String 1 A broad description of the geographical areas involved in the study
Start Date Date 1 The date on which the main study formally began, or is due to begin
Completion Date
Date 0..1 The actual, or planned, completion date for cohort management
Links Link 0..* Clickable URLs to additional information about the study
Data Access String 1 A description of the data sharing policy for the study
Additional Information
String 1 Any important extra information which doesn’t fit in the existing fields
Contacts Contact 0..* Methods of contact, where applicable
Identifiers Identifier 0..* The MRC study identifier, along with any other unique identifiers for the study
Last Updated
Date 1 The date of the last update of this Study record
Table 1: Primary attributes of the Study class
Further information on studiesThe fields shown in Table 2 will be maintained at a lower level of access and some fields may not be publicly available.
Field Name Type Cardinality
Description
Related Parties
String 1 Categories of data recorded about related parties
Sampling Method
String 1 The sampling method used in data collection
Participation Type
String 1 An indication of whether participants opted in or out
Other Contact 0..* A list of other investigators, and those who
Field Name Type Cardinality
Description
Investigators have added value to the study
Abstract String 1 Further information about the study, its background, and its evolution
Other Data Sources
Resource
0..* Additional, more specific resources; individual resources may have restricted access.
Inclusion Criteria
String 1 Detailed criteria for inclusion
Exclusion Criteria
String 1 Detailed criteria for exclusion
Follow Up String 1 The frequency and mechanisms employed for follow-up
Current Size String 1 A brief narrative of the current cohort size
Research Purposes
String 1 The approved research purposes for data from the study
Approvals Required
String 1 External approvals that may be required to use the data
Funding Required
String 1 Details of financial support required for data access/sharing
Keywords Keyword
0..* A set of keywords to assist in locating the study within the directory
Table 2: Secondary attributes of the Study class