supporting clinical trial data curation and integration with table mining

Post on 14-Apr-2017

149 Views

Category:

Data & Analytics

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Supporting clinical trial data curation and integration

with table miningNikola Milosevic1, Cassie Gregson3, Robert Hernandez3, Goran Nenadic1,2

1School of Computer Science, University of Manchester2 The Farr Institute @HeRC3AstraZeneca

Clinical trial publications• Around 800 000 clinical trials in PubMed• Difficult to digest/search• Text mining approaches• But tables and figures are

often not processed

Tables in publications• Present factual information• Usually:• Experimental settings (i.e. demographics)• Findings and results (e.g. DDI, side effects, adverse events…)• Background information (previous research, datasets, etc.)• Examples

• Important information about trials

Extraction and curation of table data

Challenges• Complex structure• Table dimensionality (1, 2, multi-dimensional)• Visual relationships

• Dense content• Ambiguous short text• Lack of context• Acronyms and abbreviations• Incomplete information

Table analysis overview

Table types (1)• 4 types: list, matrix, super-row and multi-tables• List table:

Table types (2)• Matrix table

Table types (3)• Super-row table

Table types (4)• Multi-table

Example of decomposition

Example of decomposition

Example of decomposition

Results

Next steps• Add semantic annotations• Link patterns in data cells with its meaning• Build/Expand knowledge bases• Relate to existing knowledge on the semantic web

Annotation schema• Meta-data• Paper (name, abstract, authors, publisher)• Authors (names, emails, affiliations)• Table (caption, footers)• Cells (content, role)• Inter-cell relationships• Semantics (links to ontologies, dictionaries, knowledge bases)

Summary• Tables contain valuable information such as settings or

results • System for extraction and curation of table data• Decomposition and annotation of the tables• Accuracy of 85%

• Semantic analysis and information extraction

nikola.milosevic@manchester.ac.uk

top related