pal gov.tutorial4.session13.arabicontology
TRANSCRIPT
1PalGov © 2011 1PalGov © 2011
أكاديمية الحكومة اإللكترونية الفلسطينيةThe Palestinian eGovernment Academy
www.egovacademy.ps
Tutorial 4: Ontology Engineering & Lexical Semantics
Session 13
ArabicOntology
Dr. Mustafa Jarrar
University of Birzeit
www.jarrar.info
2PalGov © 2011 2PalGov © 2011
About
This tutorial is part of the PalGov project, funded by the TEMPUS IV program of the
Commission of the European Communities, grant agreement 511159-TEMPUS-1-
2010-1-PS-TEMPUS-JPHES. The project website: www.egovacademy.ps
University of Trento, Italy
University of Namur, Belgium
Vrije Universiteit Brussel, Belgium
TrueTrust, UK
Birzeit University, Palestine
(Coordinator )
Palestine Polytechnic University, Palestine
Palestine Technical University, PalestineUniversité de Savoie, France
Ministry of Local Government, Palestine
Ministry of Telecom and IT, Palestine
Ministry of Interior, Palestine
Project Consortium:
Coordinator:
Dr. Mustafa Jarrar
Birzeit University, P.O.Box 14- Birzeit, Palestine
Telfax:+972 2 2982935 [email protected]
3PalGov © 2011 3PalGov © 2011
© Copyright Notes
Everyone is encouraged to use this material, or part of it, but should
properly cite the project (logo and website), and the author of that part.
No part of this tutorial may be reproduced or modified in any form or by
any means, without prior written permission from the project, who have
the full copyrights on the material.
Attribution-NonCommercial-ShareAlike
CC-BY-NC-SA
This license lets others remix, tweak, and build upon your work non-
commercially, as long as they credit you and license their new creations
under the identical terms.
4PalGov © 2011
Tutorial Map
Topic Time
Session 1_1: The Need for Sharing Semantics 1.5
Session 1_2: What is an ontology 1.5
Session 2: Lab- Build a Population Ontology 3
Session 3: Lab- Build a BankCustomer Ontology 3
Session 4: Lab- Build a BankCustomer Ontology 3
Session 5: Lab- Ontology Tools 3
Session 6_1: Ontology Engineering Challenges 1.5
Session 6_2: Ontology Double Articulation 1.5
Session 7: Lab - Build a Legal-Person Ontology 3
Session 8_1: Ontology Modeling Challenges 1.5
Session 8_2: Stepwise Methodologies 1.5
Session 9: Lab - Build a Legal-Person Ontology 3
Session 10: Zinnar – The Palestinian eGovernmentInteroperability Framework
3
Session 11: Lab- Using Zinnar in web services 3
Session 12_1: Lexical Semantics and Multilingually 1.5
Session 12_2: WordNets 1.5
Session 13: ArabicOntology 3
Session 14: Lab-Using Linguistic Ontologies 3
Session 15: Lab-Using Linguistic Ontologies 3
Intended Learning ObjectivesA: Knowledge and Understanding
4a1: Demonstrate knowledge of what is an ontology,
how it is built, and what it is used for.
4a2: Demonstrate knowledge of ontology engineering
and evaluation.
4a3: Describe the difference between an ontology and a
schema, and an ontology and a dictionary.
4a4: Explain the concept of language ontologies, lexical
semantics and multilingualism.
B: Intellectual Skills
4b1: Develop quality ontologies.
4b2: Tackle ontology engineering challenges.
4b3: Develop multilingual ontologies.
4b4: Formulate quality glosses.
C: Professional and Practical Skills
4c1: Use ontology tools.
4c2: (Re)use existing Language ontologies.
D: General and Transferable Skills
d1: Working with team.
d2: Presenting and defending ideas.
d3: Use of creativity and innovation in problem solving.
d4: Develop communication skills and logical reasoning
abilities.
5PalGov © 2011 5PalGov © 2011
Session ILOs
This session will help student to:
4a4: Explain the concept of language ontologies, lexical
semantics and multilingualism.
4b4: Formulate quality glosses.
4b3: Develop multilingual ontologies.
6PalGov © 2011 6PalGov © 2011
Reading
Mustafa Jarrar: Building A Formal Arabic Ontology (Invited Paper) . In proceedings of
the Experts Meeting On Arabic Ontologies And Semantic Networks. Alecso, Arab League.
Tunis, July 26-28, 2011.Article http://www.jarrar.info/publications/J11.pdf.htm
Slides: http://mjarrar.blogspot.com/2011/08/building-formal-arabic-ontology-invited.html
Mustafa Jarrar: Towards The Notion Of Gloss, And The Adoption Of Linguistic
Resources In Formal Ontology Engineering. In proceedings of the 15th International World
Wide Web Conference (WWW2006). Edinburgh, Scotland. Pages 497-503. ACM Press. ISBN:
1595933239. May 2006.http://www.jarrar.info/publications/J06.pdf.htm
Aldo Gangemi , Nicola Guarino , Alessandro Oltramari , Ro Oltramari , Stefano Borgo:
Cleaning-up WordNet's Top-Level. In Proc. of the 1st
International WordNetConference (2002)
http://citeseer.ist.psu.edu/viewdoc/download;jsessionid=C9962DFEDD793F3F839426B774BC
9BAF?doi=10.1.1.11.4064&rep=rep1&type=pdf
7PalGov © 2011 7PalGov © 2011
The Arabic Ontology Project
• A project started in 2010, at Birzeit University, Palestine.
• The ArabicOntology is more than an Arabic WordNet
• Unlike WordNet, the ArabicOntology is logically and philosophically well-
founded, as it follows strict ontological principles. but can be used an
Arabic WordNet.
http://sites.birzeit.edu/comp/ArabicOntology
The project is partially funded
(Seed funding) by Birzeit
University (VP academic
Office, Research Committee).
8PalGov © 2011 8PalGov © 2011
Arabic Ontology: Data Model (Simplified)
• ConceptID (as a synsetID in WordNet) to identify a concept.
• Polysemy and synonymy: like in WordNet, several words (i.e., lexical
units) can be used to lexicalize one concept (synonymy); and one word
might be used to lexicalize several concepts.
Gloss: describes a concept
Concept ID: concept unique reference
Lexical Unit
Semantic RelationsSemantic Relations
9PalGov © 2011 9PalGov © 2011
Lexical vs. Semantic Relationships
• Semantic relations are relationships between concepts (not words),
e.g., subtype, part-of, etc.
• Lexical relations are relationships between words (not concepts), e.g.,
synonym-of, root-of, abbreviation-of, etc.
• Ontologies are mainly concerned with semantic relations.
Gloss: describes a concept
Concept ID: concept unique reference
Semantic RelationsSemantic Relations
Lexical Unit
10PalGov © 2011 10PalGov © 2011
Arabic Ontology
• Arabic Ontology: the set of concepts (of all Arabic terms), and the
semantic (not lexical) relationships between these concepts.
• To build an Arabic Ontology: Identify the set of concepts for every
Arabic word (Polysemy), and define semantic relations between these
concepts.
• Most important relation is the subtype relation,
which leads to a (tree of concepts) .
11PalGov © 2011 11PalGov © 2011
Arabic Ontology: Subtype Relationships
• Subtype relation: is a mathematical relations (subset: A B ), such
that every instance in A must also be an instance of B.
• Inheritance: subtypes inherit all properties of their super types.
• “Hyponymy” in WordNet is close to (but not the same as) the subtype relation.
• “General-Specific” relations, as in thesauri, are not subtype relations.
world
1410
6 .
.. ..
.
..
.
.. ..
.
...
.
.
..
.
.
... ... .
. ... .. ..
.
....
. ....
..
..
..
.
.
.
..
.
.
3
4
12PalGov © 2011 12PalGov © 2011
Arabic Ontology: Subtype Relationships
• It is recommended to use proper subtypes, as it is more strict.
• That is, A and B are never equal, B is always a super set of A.
• It is recommended to classify concepts based on “rigidity”.
• For example it is wrong to say that a „WorkTable‟ is type of „Table‟.
as being a work table is a non-rigid property.
• As such, subtypes form a tree.
13PalGov © 2011 13PalGov © 2011
Arabic Ontology: Core (Top Levels).
Arabic Core Ontology: the top levels of the Arabic Ontology, - built
manually based on DOLCE and SUMO upper level ontologies, and
taking into account, carefully, the philosophical and historical aspects of
the Arabic concepts\terms.Top 3 levels shown here, for simplicity
• The 10th level of this core ontology should top all Arabic concepts and levels.
• This allow us to detect any problems in the tree/relations!
• The core Ontology governs the correctness and the evolution of the whole
Arabic Ontology.
10
lev
els
, 55
0 c
on
ce
pts
العربية الكلمات لجميع المعاني أمهات
14PalGov © 2011 14PalGov © 2011
Arabic Ontology: Glossesaccording to strict ontological guidelines[J06]
A gloss: is an auxiliary informal (but controlled) account of the intended
meaning of a linguistic term, for the commonsense perception of humans.
A gloss is supposed to render factual knowledge that is critical to understand a concept, but that
e.g. is implausible, unreasonable, or very difficult to formalize and/or articulate explicitly. (NOT) to
catalogue general information and comments, as e.g. conventional dictionaries and encyclopedias
usually do, or as <rdfs:comment>.
15PalGov © 2011 15PalGov © 2011
What should and what should not be provided in a gloss:
1. Start with the principal/super type of the concept being defined.
E.g. „Search engine‟: “A computer program that …”, „Invoice‟: “A business document that…”,
„University‟: “An institution of …”.
2. Written in a form of propositions, offering the reader inferential knowledge that help him to construct the image of the concept. E.g. Compare „Search engine‟:
“A computer program for searching the internet, it can be defined as one of the most useful aspects
of the World Wide Web. Some of the major ones are Google, ….”;
A computer program that enables users to search and retrieves documents or data from a database
or from a computer network…”.
Arabic Ontology: Gloss Guidelines
3. Focus on distinguishing characteristics and intrinsic prosperities that
differentiate the concept out of other concepts.
E.g. Compare, „Laptop computer‟:
“A computer that is designed to do pretty much anything a desktop computer can do, it runs for a
short time (usually two to five hours) on batteries”.
“A portable computer small enough to use in your lap…”.
16PalGov © 2011 16PalGov © 2011
4. Use supportive examples :
- To clarify cases that are commonly known to be false but they are true, or
that are known to be true but they are false;
- To strengthen and illustrate distinguishing characteristics (e.g. define by
examples, counter-examples).
Examples can be types and/or instances of the concept being defined.
5. Be consistent with formal definitions/axioms.
6. Be sufficient, clear, and easy to understand.
Arabic Ontology: Gloss Guidelines
WordNet glosses do not follow such ontological guidelines
17PalGov © 2011 17PalGov © 2011
Arabic Ontology: Gloss Guidelines
As a gloss starts with a supertype of concept being defined, try to read
the gloss as the following, to verify what you do is correct:
.بياناث مكىنت من صفىف وأعمدة مصفىفت: جدول
.وأعمدةبياناث جنبا الى جنب على شكل صفىف ترتيب: جدول
.وأعمدةبياناث بصىرة ممنهجت جنبا الى جنب على شكل صفىف تنظيم: جدول
18PalGov © 2011 18PalGov © 2011
ArabicOntology Vs WordNet
Unlike WordNet, the Arabic Ontology is:
1. Philosophically well founded:
• Focuses on intrinsic properties;
• All types are rigid;
• The top level is derived from known Top Level Ontologies.
2. Strictly formal:
• Semantic relations are well-defined mathematical relations.
3. Strictly-controlled glosses
• The content and structure of the glosses is strictly based on
ontological principles.
19PalGov © 2011 19PalGov © 2011
Methodology and Progress
20PalGov © 2011 20PalGov © 2011
Our Approach to Building the
ArabicOntology
Step1:
Mine Arabic concepts/glosses from dictionaries.
Step 2:
Automatically map between these Arabic concepts and WordNet
concepts, thus inherit semantic relations from WordNet.
Step 3:
Link all concepts with the Arabic Core Ontology.
Step 4:
Re-formulate these glosses, according to strict ontological guidelines.
Roughly:
21PalGov © 2011 21PalGov © 2011
Step1-Mining Arabic Concepts from
Dictionaries
Mining
concepts
• Collect as much glosses/concepts as possible from specialized and general
dictionaries.
• Manual extraction from dictionaries, then basic cleaning done automatically.
• 35k glosses ready.
• We have ~100 students typing dictionaries now!
• +100K more glosses (expected this year)
22PalGov © 2011 22PalGov © 2011
Step1-Mining Arabic Concepts from
Dictionaries
Mining
concepts
• Collect as much glosses/concepts as possible from specialized and general
dictionaries.
• Manual extraction from dictionaries, then basic cleaning done automatically.
23PalGov © 2011 23PalGov © 2011
Step1-Mining Arabic Concepts from
Dictionaries
Mining
concepts
• Most Arabic dictionaries are not useful, but some are a good start.
The dictionaries we need should:
Focus on the semantic aspects.
Multiple meanings are not mixed up.
Structure of quality of the meaning.
24PalGov © 2011 24PalGov © 2011
Examples (Good & Bad Resources)
Wiktionary
معجم مصطلح األصول
والمتوارد المترادف
بلدانمعجم ال
الحاسبات معجم
معجم تعريف مصطلحات القانون الخاص
أقرب الموارد
اإلسالمي المعجم
معجم األلفاظ المشتركة في اللغة العربية
زالمعجم الوجيز
25PalGov © 2011 25PalGov © 2011
Step2: Map Arabic concepts to WordNet
(Matching Function)
We developed a smart algorithm, such that:
Input: (Arabic gloss, 117k English glosses in WordNet).
Output: (best match, rank)
Accuracy: +90% (being improved)WordNet (English)
The territory occupied by one of the constituent
administrative districts of a nation
The way something is with respect to its main
attributes
The group of people comprising the government
of a sovereign state
A politically organized body of people under a
single government
A compilation of the known facts regarding
something or someone
….
بلد لها حدود معروفة وشعب
مة سات منظ وفيها حكومة ومؤس
A politically organized
body of people under a
single government
26PalGov © 2011 26PalGov © 2011
The Matching Function is used for:
1- Based on the previous mapping, we can inherit Semantic Relations
from WordNet.
L
B
QR
D
WordNet Concepts
A
C
Arabic Concepts
J
H
2- Same function is used to detect redundant concepts, within the
Arabic Ontology itself.
Remark: This is only a good start, as these inherited relations need to be
cleaned using the Arabic Top Levels, and using the OnToClean Methodology.
27PalGov © 2011 27PalGov © 2011
Step 3: Link concepts with the Arabic Core
Ontology
Each Arabic concept (from previous steps) is mapped to a concept in the
10th level.
That is, the 10th level of this core ontology should top all Arabic concepts
and levels, so to enable automatic detection of problems in the hierarchy.
Top 3 levels shown here, for simplicity
A CJ
J
28PalGov © 2011 28PalGov © 2011
Until this stage
We have many concepts extracted from linguistic resources, but the
glosses are not well-written!
We have many possible subtype relations between concepts, derived via
the mappings to WordNet concepts.
We have a sample of 6000 Arabic concepts mapped to the 10th level in the
Core ontology.
We need to:
Clean the glosses,
Clean/correct the subtype links.
29PalGov © 2011 29PalGov © 2011
Automatic Detection of Inconsistencies
If (J A) and (A لغوي اصطالح ) then it‟s most likely true that (J اصطالح
(لغوي , thus no need to have (J لغوي اصطالح ).
However, as H and C don‟t share a supertype, (H C) is likely incorrect.
Top 3 levels shown here, for simplicity
A CJ
H
X!
Subtype links from Arabic concepts to the core ontology (done manually)
Subtypes links between Arabic concepts (derived via the mappings to WordNet)
Now we can automatically detect whether the links are correct?
X!
30PalGov © 2011 30PalGov © 2011
Step 4- Re-Formulate Glosses,according to strict ontological guidelines[J06]
Glosses are re-formulated semi-manually, to meet our strict rules.
Gloss-cleaning can be done automatically to a certain point.
While the manual-cleaning (=re-formulating) glosses, mistakes in
subtype relation can be detected.
31PalGov © 2011 31PalGov © 2011
Further Research (ongoing)
Given many Arabic-English, Arabic-French, Arabic-Italian dictionaries
Can we derive an Arabic-Arabic thesaurus? For example:
جدول: مصفوفة، نهر، قائمة، قناة ماء
Then Categorize very-related words (maybe using WordNet) as the
following:
جدول: مصفوفة، قائمة، نهر، قناة ماء
This will help finding possible Arabic synsets, which help detecting
possible subtype relations and/or validate the existing relations.
32PalGov © 2011 32PalGov © 2011
References
Mustafa Jarrar: Building A Formal Arabic Ontology (Invited Paper) . In proceedings of the Experts Meeting On Arabic Ontologies
And Semantic Networks. Alecso, Arab League. Tunis, July 26-28, 2011.Article http://www.jarrar.info/publications/J11.pdf.htm
Slides: http://mjarrar.blogspot.com/2011/08/building-formal-arabic-ontology-invited.html
Mustafa Jarrar: Towards The Notion Of Gloss, And The Adoption Of Linguistic Resources In Formal Ontology
Engineering. In proceedings of the 15th International World Wide Web Conference (WWW2006). Edinburgh, Scotland. Pages 497
503. ACM Press. ISBN: 1595933239. May 2006.http://www.jarrar.info/publications/J06.pdf.htm
[MBC93] George A. Miller, Richard Beckwith, Christiane Fellbaum, Derek Gross, and Katherine Miller: Introduction to WordNet:
An On-line Lexical Database. International Journal of Lexicography, Vol. 3, Nr. 4. Pages 235-244. (1990)
http://wordnetcode.princeton.edu/5papers.pdf
[GGO02] Aldo Gangemi , Nicola Guarino , Alessandro Oltramari , Ro Oltramari , Stefano Borgo: Cleaning-up WordNet's Top-
Level. In Proc. of the 1st International WordNetConference (2002)
http://citeseer.ist.psu.edu/viewdoc/download;jsessionid=C9962DFEDD793F3F839426B774BC9BAF?doi=10.1.1.11.4064&rep=rep1
&type=pdf
Roche Christophe, Calberg-Challot Marie (2010): “Synonymy in Terminology: the Contribution of Ontoterminology”, Re-
thinking synonymy: semantic sameness and similarity in languages and their description, Helsinki, 2010http://www.linguistics.fi/synonymy/Synonymy%20Ontoterminology%20Helsinki%202010.pdf
Roche Christophe, Calberg-Challot Marie, Damas Luc, Rouard Philippe (2009): “Ontoterminology: A new paradigm for
terminology”. KEOD, Madeirahttp://ontology.univ-savoie.fr/condillac/files/docs/articles/Ontoterminology-a-new-paradigm-for-terminology.pdf