ontochem ontologies

4
• AVAILABLE ONTOLOGIES FOR YOUR BIOMEDICAL TEXT MINING: proteins, genes, chemicals, diseases, cell lines, general species, plants, anatomy, physiological effects, cosmetology, geopolitical regions, authors, relationships... • WE CREATE CUSTOM ONTOLOGIES WITH OUR UNIQUE AND POWERFUL SOFTWARE TOOLS Ontologies provide the basis for identifying concepts in text mining technologies. Subsequent extraction of facts and relationships between these concepts enables data mining and provides the foundation for novel “in silico” knowledge discovery methods. OntoChem is using ontologies for the extraction of implicit, unknown and useful information from databases and document collections such as patents or scientific literature. ONTOCHEM ONTOLOGIES chem onto

Upload: raya-agency

Post on 10-Mar-2016

250 views

Category:

Documents


2 download

DESCRIPTION

Brochure for Ontochem

TRANSCRIPT

Page 1: Ontochem Ontologies

• AVAILABLE ONTOLOGIES FOR YOUR BIOMEDICAL TEXT MINING: proteins, genes, chemicals, diseases, cell lines, general species, plants, anatomy, physiological effects, cosmetology, geopolitical regions, authors, relationships...

• WE CREATE CUSTOM ONTOLOGIES WITH OUR UNIQUE AND POWERFUL SOFTWARE TOOLS

Ontologies provide the basis for identifying concepts in text mining technologies. Subsequent extraction of facts and relationships between these

concepts enables data mining and provides the foundation for novel “in silico” knowledge discovery methods. OntoChem is using ontologies

for the extraction of implicit, unknown and useful information from databases and document collections such as patents or scientifi c literature.

ONTOCHEM ONTOLOGIES

chemonto

Page 2: Ontochem Ontologies

effects

is found in

6 memberedheterocycles

aromaticcompunds

anti-inflammatoryagent

chemistry

is a

is a

is a

is a

is a

is a

is a

is a

is a

treats

is a

is a

is a

is a

is a is a

is a

is a

is a

is found

in

Filipendulaulmaria

Rheumatic Fever

range of distribution

range of distribution

is part of

is part ofis part of world

regions

diseases

Filipendula Salicaceae

species

Salix

……

…… …

is a

SALIXALBA

D (-)-Salicin

Europe

Northern Africa

Africa

Ontology (derived from onto- the Greek ὤν, ὄντος “being;

that which is”, present participle of the verb εἰμί “be”, and

-λογία , -logia: science, study, theory) is the philosophical

study of the nature of being, existence, or reality as such,

as well as the basic categories of being and their relations.

In computer science, an ontology formally represents knowledge

as a set of concepts within a domain, and the relationships

between those concepts, enabling semantic data integration,

data mining and knowledge generation. Ontologies are explicit

specifi cations of a topic including a vocabulary of terms and

concepts with defi ned logical relationships to each other.

http://en.wikipedia.org/wiki/Ontology_(information_science)

● Finding specifi c relationships between domains, e.g.

which compounds have been isolated from plants –

information that was previously only available from

manually curated databases is now generated on the fl y

● Similarity search and ranking of documents based on

ontology concept metrics. This gives more relevant

results than conventional technologies such as word

frequencies or key words.

OntoChem develops ontologies in the areas of chemistry,

species, diseases, anatomy, cell lines, proteins, pharmacolo-

gical effects, languages, geopolitical and climate zones,

company information for business intelligence and others.

EXAMPLE

Ontologies together with heuristic and linguistic methods

are applied for semantic processing of unstructured

information sources like scientifi c articles, patents and others.

Using for example our species, chemistry and geographical

ontologies, one may retrieve relationships for the white

willow (Salix alba) as follows:

INTEGRATED APPROACH

OntoChem has an integrated approach – from custom made

novel tools and algorithms up to ready-to-use ontologies

and text annotation with OCMiner®. We build, update, validate

and merge general, chemical and biological ontologies for

biomedical data mining applications. OntoChem’s ontology

approach allows for stable concept IDs – making updates

easier and past annotations interpretable. Our modular

software enables quick assembly of derived meta-ontologies

that are quality checked. OntoChem’s unique selling point

is also the scalability of its patented methods for high perfor-

mance text processing – enabling ontologies to contain up to

billion terms for annotation and very fast text annotations.

USE OF ONTOLOGIES

Our data and knowledge extraction technology OCMiner®

uses ontologies for a variety of information retrieval tasks:

● Classifi cation of entities, for example assigning

specifi c compounds to compound classes, relating

physiological symptoms to a disease, or defi ning

specifi c relationship types using a custom developed

regular expression syntax language

● Ontology aware search engines such as our demo

server www.ocminer.com allow to search for concepts,

for example the search term “plants” will return

documents mentioning specifi c plants such as “salix”

or “Filipendula ulmaria”

Page 3: Ontochem Ontologies

AVAILABLE ONTOLOGIES

OntoChem has implemented technologies to build dictionaries,

controlled vocabulary, taxonomies or ontologies comprising

more than 100 million terms from various domains. Examples

are our ontologies for general species, plants and fungi, cell

lines, general anatomy, plant and fungal anatomy, diseases,

pharmacological and physiological effects, cosmetology,

proteins, genes, chemistry, languages, geopolitical and

climate zones, company information for business intelligence

and domain specifi c relationship ontologies.

Each ontology concept contains further data, such as relation-

ships to other concepts, links to external sources, language

information, its synonyms and related updating information.

OntoChem’s ontologies can be stored and used in various

formats such as OBO, CSV, XML (using specifi c fl avors such as

RDF, OWL, CML, SBML or others), SKOS etc.

When ontologies are used for text mining, we have specifi c

modules that enhance the value of ontologies, either by

generating an enriched ontology with additional terms or

by using these modules at the time point of annotation:

● Spelling variations (e.g. British-American English,

plural forms)

● Diacritic character, space/hyphen/apostrophe handling

● Ontology dependent conditional black and white lists,

case sensitive annotations

● Automated detection of acronyms and abbreviations

An unique ontology format has been developed to extract

relationships between named entities (NE) in text. Domain

specifi c relationship ontologies are used together with the

related ontologies and a new regular syntax expression

language to extract relationships with high precision and recall.

ONTOLOGY TOOLS

To create, manage, update and validate ontologies we have

developed a range of different software tools.

Chemistry ontology editor We have developed the fi rst

specialized chemistry ontology editor, SODIAC (structure

ontology development and individual assignment center),

to support the development of chemical ontologies. Using

the OBO format, it implements known functionality of an

ontology editor together with a chemistry structure editor

that allows structure based addition, editing and ontology

checks. SODIAC can be used to annotate conventional structure

fi les or chemical databases whereby each compound will

be assigned to its chemical structure classes.

Using SODIAC, we have developed chemical ontologies

that comprise structure based classifi cations but also biology

related classifi cations of chemical compounds. Particular

emphasis has been given to natural products, for example

steroids or sugars, but also to all classes of heterocycles and

compound classes that are of interest for biomedical research.

In addition, classifi cations such as vitamins, food and fl avor,

cosmetics, drugs and FEMA compounds can be assigned.

OntViewer is designed to display, review and check very large

ontologies with up to multi-GB data, such as for example the

chemistry or the proteins/genes ontologies. It also performs

logical, statistical and consistency checks on the ontology.

Screenshot of SODIAC, our specialized ontology editor for general and chemical ontologies

Screenshot of OntViewer, showing the ontology tree of ChEBI with different relationships.

Page 4: Ontochem Ontologies

OntoChem has also developed a series of custom build

command line tools that aid creating, updating and

validating ontologies:

● Searching and proposing candidate synonym concept

terms in document collections

● Automated generation of spelling variations

● Checking and correcting homonyms or logical errors

within or between ontologies

Together, our technologies provide a straightforward and

comprehensive toolbox for various tasks when working with

ontologies.

ADVANTAGES

OntoChem’s ontologies, together with OCMiner® are ideally

suited for high speed, high quality annotation and search

of large data volumes. For example, annotating PubMed

abstracts in the demo application www.ocminer.com and

using the chemistry search term “heterocyclic compounds”

in www.ocminer.com retrieves 3.124.129 hit documents,

while a native PubMed (http://www.ncbi.nlm.nih.gov/

pubmed?term=heterocyclic) search fi nds 24.524 hit documents.

Using the cell line “SKMEL-28” as a search term retrieves

296 documents, while the native PubMed search

(http://www.ncbi.nlm.nih.gov/pubmed?term=skmel-28)

delivers 26 hit documents.

Screenshot of HugeEdit, showing a large data fi le of compounds and their names.

HugeEdit is a simple and fast text editor for displaying, searching

and editing very large data fi les with up to multi-GB data and

multi-million lines without the need to hold the complete data

in memory. It is especially suited to work with column separated

data too large to be edited in standard spreadsheet editors.

OntoChem GmbHHeinrich-Damerow-Str. 406120 HalleGermany

[email protected]