presentacion tesisangel-revisada
TRANSCRIPT
LiDom Builder: Automatising the Construction of Multilingual Domain Modules
Ángel Conde ManjónGaLan Research Group – LSI Department
University of the Basque Country (UPV/EHU)
Supervisors:Dr. Mikel Larrañaga Olagaray & Dr. Ana Arruarte Lasa
UPV/EHU
25 February 2016
• Technology Supported Learning Systems (TSLS)• Learning Management Systems: • Massive Open Online Courses: • Intelligent Tutoring Systems: SQL-Tutor• …
• Bilingual and Multilingual Contexts are a reality (Unesco, 2003)
• Acquiring the Domain Module is a cost and work intensive task
IntroductionAcquisition of Multilingual Terminology
Identification of Pedagogical
Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
Context
2
IntroductionAcquisition of Multilingual Terminology
Identification of Pedagogical
Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
Main Goal
Automatising the construction of MULTILINGUAL DOMAIN MODULES
DOM-Sortze (Larrañaga, 2012) a framework for building DOMAIN MODULES from electronic textbooks
IntroductionAcquisition of Multilingual Terminology
Identification of Pedagogical
Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
Previous Work: DOM-Sortze
Electronic Textbook
LDO Gathering
Preprocess
LOs Gathering
Domain Module
Document Body Internal Representation
Document Outline Internal Representation
Learning Domain Ontology
1
23
IntroductionAcquisition of Multilingual Terminology
Identification of Pedagogical
Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
Previous Work: DOM-Sortze
Planetary System Solar System
Moon
Satellite
Planet Earth
partOfpartOfpartOf
isA
isAprerequisite
The Moon is Earth's only natural satellite
LO1
hasDR
IntroductionAcquisition of Multilingual Terminology
Identification of Pedagogical
Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
DOM-Sortze: Domain Module Representation Formalism
Learning Domain Ontology (LDO)Topics and pedagogical relationships
Learning Objects (LO)• Definitions• Examples• Problem Statements• …
Limitations of DOM-Sortze:
1. Developed for a single language: Basque.
2. Its formalism is not able to represent Multilingual Domain Modules.
IntroductionAcquisition of Multilingual Terminology
Identification of Pedagogical
Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
DOM-Sortze: Limitations
1. Can be the formalism used in DOM-Sortze be enhanced for Multilingual Domain Modules?
– Extend the formalism to deal with Multilingual Domain Modules.
2. Which enhancements are required to deal with various languages?
– Develop a method for extracting Multilingual Terminology.
– Improve the Relationship Acquisition.
– Provide a method for acquiring Multilingual Learning Objects.
Automatising the construction of MULTILINGUAL DOMAIN MODULES
IntroductionAcquisition of Multilingual Terminology
Identification of Pedagogical
Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
Goals
IntroductionAcquisition of Multilingual Terminology
Identification of Pedagogical
Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
I. Introduction: Motivations and GoalsII. LiDom Builder: Building Multilingual Domain
ModulesIII. Acquisition of Multilingual TerminologyIV. Identification of Pedagogical RelationshipsV. Gathering Multilingual Learning ObjectsVI. Conclusions and Future Work
Outline
I. Introduction: Motivations and GoalsII. LiDom Builder: Building Multilingual Domain
ModulesIII. Acquisition of Multilingual TerminologyIV. Identification of Pedagogical RelationshipsV. Gathering Multilingual Learning ObjectsVI. Conclusions and Future Work
IntroductionAcquisition of Multilingual Terminology
Identification of Pedagogical
Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
Outline
IntroductionAcquisition of Multilingual Terminology
Identification of Pedagogical
Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
Multilingual Terminology Extraction
Pedagogical Relationship Extraction
Textbook
Multilingual Learning Object
Generation
LiDom Builder
Overview
LiDom Builder: framework for automatising the acquisition of Multilingual Domain Modules
Domain Module
Equiv. “en”Equiv. “es”
Planetary System Solar System
Moon
Satellite
Planet Earth
partOfpartOf partOf
isA
isAprerequisite
pedagogically
Close
“ilargi”
“luna”
“moon”
LO1 LO2
eu
en
es
hasDR hasDR
@
@ @
@
@
@
IntroductionAcquisition of Multilingual Terminology
Identification of Pedagogical
Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
Multilingual Domain Module Formalism
Language Identification
LDO Gathering
Electronic Textbook
Preprocess
LOs Gathering
Document Internal Representation
Document Outline Internal Representation
1
23
Domain ModuleLearning Domain Ontology
NLP Parsers Illinois ChunkerIllinois POS taggerFreeLingIXA-Pipes
Topic ExtractionRelationship ExtractionSet of HeuristicsGrammar
Multilingual LOsGrammar Discourse Markers
IntroductionAcquisition of Multilingual Terminology
Identification of Pedagogical
Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
Proposed Enhancements
LiTeWi
LiReWi
LiLoWi
0
12
Electronic Textbook
LDO Gathering
Preprocess
LOs Gathering
Document Internal Representation
Document Outline Internal Representation
1
23
Domain ModuleLearning Domain Ontology
Knowledge Resources
…..
IntroductionAcquisition of Multilingual Terminology
Identification of Pedagogical
Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
Proposed Enhancements
• Two phases
• Tuning up• Set the thresholds and default confidence values.
• Evaluation• Gold Standard (Recall, Precision, F1-Score).
• Expert validation.
• Use of three textbooks
1. Programming: Introduction to Object Oriented Programming (Wong .S, 2010).
2. Astronomy: Introduction to Astronomy (Morison, 2008).
3. Biology: Introduction to Molecular Biology (Raineri,2010).
IntroductionAcquisition of Multilingual Terminology
Identification of Pedagogical
Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
General Evaluation Methodology
I. Introduction: Motivation and GoalsII. LiDom Builder: Building Multilingual Domain
ModulesIII. Acquisition of Multilingual TerminologyIV. Identification of Pedagogical RelationshipsV. Gathering Multilingual Learning ObjectsVI. Conclusions and Future Work
IntroductionAcquisition of
Multilingual TerminologyIdentification of
Pedagogical Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
Outline
In DOM-Sortze, terminology extracted with ErauzTerm (Alegria et al., 2004).
A new tool called LiTeWi has been developed.
IntroductionAcquisition of
Multilingual TerminologyIdentification of
Pedagogical Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
Acquisition of Multilingual Terminology
IntroductionAcquisition of
Multilingual TerminologyIdentification of
Pedagogical Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
LiTeWi
TF-IDF KP-Miner CValue Shallow Parsing Grammar
Electronic TextbookCandidate Extraction
Generic Corpus
Mapping
Disambiguation
Filtering
Mapping to other languagesCandidate Selection
Combination
IntroductionAcquisition of
Multilingual TerminologyIdentification of
Pedagogical Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
Shallow Parsing Algorithm
• Uses a derived grammar from (Larrañaga, 2012).
Constraint Grammar applied
to POS tagsShallow Parser
TopicsArray ListStack………
GrammarTopic + [*]+ part of + [det] +Topic……………….
Textbook
Sentences may contain topicsThis is called an Array ListA Stack is used to model systems that exhibit LIFO…
Extraction Rules
Chunksan Array ListA Stack…….
IntroductionAcquisition of
Multilingual TerminologyIdentification of
Pedagogical Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
LiTeWi
TF-IDF KP-Miner CValue Shallow Parsing Grammar
Electronic TextbookCandidate Extraction
Mapping
Disambiguation
Filtering
Mapping to other languages
Generic Corpus
Candidate Selection
Combination
20
IntroductionAcquisition of
Multilingual TerminologyIdentification of
Pedagogical Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
Mapping
• Terms mapped to their corresponding Wikipedia articles.
• Search procedure to match Wikipedia article titles and their labels.
IntroductionAcquisition of
Multilingual TerminologyIdentification of
Pedagogical Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
LiTeWi
TF-IDF KP-Miner CValue Shallow Parsing Grammar
Electronic TextbookCandidate Extraction
Mapping
Disambiguation
Filtering
Mapping to other languages
Generic Corpus
Candidate Selection
Combination
22
IntroductionAcquisition of
Multilingual TerminologyIdentification of
Pedagogical Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
Disambiguation
• Method based on global disambiguation (Milne et al., 2008).
• Domain knowledge step added to improve the results.
• Use as a disambiguation context the domain important terms.
• Gold Term List: Domain important terms with only one sense.
Monosemic terms that have highest CValue score.
IntroductionAcquisition of
Multilingual TerminologyIdentification of
Pedagogical Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
Disambiguation
WikiminerCompare Service
Term List (to disambiguate)-Java
- Inheritance-Property
Disambiguated Term -Java (programming Language)
Gold Term List-Class
-Programming Language-Array List
Class Prog. Lang.
Array List
Prog. Language 0.90 0.85 0.64
Island 0.7 0.77 0.53
City 0.56 0.75 0.6
Average
0.890.70
0.63
-Java
IntroductionAcquisition of
Multilingual TerminologyIdentification of
Pedagogical Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
LiTeWi
TF-IDF KP-Miner CValue Shallow Parsing Grammar
Electronic TextbookCandidate Extraction
Mapping
Disambiguation
Filtering
Mapping to other languages
Generic Corpus
Candidate Selection
Combination
IntroductionAcquisition of
Multilingual TerminologyIdentification of
Pedagogical Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
Filtering Unwanted Terms
WikiminerCompare Service
Number of Related Gold Terms
Gold Term List-Solar System- Black Hole-Solar Mass
Term List (to filter)-Universal Studios
-Planet-Windows 98
Relatedness Score-Planet -Windows 98
Domain Related Term
-Planet
-Planet
N(>1)
Threshold(>=0.6)
Solar System (0.34)
Black Hole (0.53)
Solar Mass (0.47)
Solar System (0.23)
Black Hole (0.68)
Solar Mass (0.50)
-Universal Studios
-Windows 98
IntroductionAcquisition of
Multilingual TerminologyIdentification of
Pedagogical Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
LiTeWi
TF-IDF KP-Miner CValue Shallow Parsing Grammar
Electronic TextbookCandidate Extraction
Mapping
Disambiguation
Filtering
Mapping to other languages
Generic Corpus
Candidate Selection
Topic EN ES EUMoon Moon Luna Ilargia
Combination
IntroductionAcquisition of
Multilingual TerminologyIdentification of
Pedagogical Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
Evaluation
Tuning up
• Introduction to Object Oriented Programming textbook.
Evaluation
• Gold Standard and Expert Validation.
• Gold Standard based on the terms appearing on the index of each textbook.
• Evaluated on Introduction to Astronomy and Introduction to Molecular Biology.
IntroductionAcquisition of
Multilingual TerminologyIdentification of
Pedagogical Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
Results
Gold-Standard Ex. Validation
Precision (%) Recall (%) F1 Score (%) Correctness (%)
Astronomy 3.55 62.96 6.72 18.55
Mol. Biology 2.24 10.21 3.67 49.27
Gold-Standard Ex. ValidationPrecision (%) Recall (%) F1 Score (%) Correctness (%)
Astronomy 17.96 72.55 28.79 78.77
Mol. Biology 27.09 50.53 87.70 71.65
• Wikifier (Cheng , 2013)
• LiTeWi
IntroductionAcquisition of Multilingual Terminology
Identification of Pedagogical Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
Outline
I. Introduction: Motivation and GoalsII. LiDom Builder: Building Multilingual Domain
ModulesIII. Acquisition of Multilingual TerminologyIV. Identification of Pedagogical RelationshipsV. Gathering Multilingual Learning ObjectsVI. Conclusions and Future Work
Introduction
In DOM-Sortze, relationship acquisition for Basque using Shallow Parsing
An adaptation and extension of the Heuristic-based analysis of the outline has been developed.
A new tool called LiReWi has been developed.
IntroductionAcquisition of Multilingual Terminology
Identification of Pedagogical Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
Heuristic-based analysis of the outline
Document Outlines• Reflects the organization made by the author.• The structure of the outline underlies pedagogical relationships.• Low cost process (summarised).
DOM-Sortze• Each outline item is considered as a domain topic.• By default gathers a partOf relation between an item and its subitems. • Heuristics to detect isA relations.
LiDom Builder• Adaptation to English of heuristics from (Larrañaga et al., 2004).• Improvement of isA identification using Wikitaxonomy (Ponzetto et al., 2007).
IntroductionAcquisition of Multilingual Terminology
Identification of Pedagogical Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
IntroductionAcquisition of Multilingual Terminology
Identification of Pedagogical Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
Wikipedia Enhanced Process
………..4.- Structure of polymers / Macromolecules
4.1.- Polymer chemistry4.2.- Molecular weight4.3.- Form, structure and molecular configuration4.3.- Supramolecular arrangement4.4.- Crystalline and amorphous polymers4.5.- Families of polymeric materials
4.5.1.- Thermosettings4.5.2.- Thermoplastics4.5.3.- Elastomers
5.- Phase diagrams / Definitions5.1.- Solid solutions5.2.- Phases rule of Gibbs5.3.- Types of phase diagram
1. Identify groups of sibling nodes
2. Select the groups of leaf nodes in which the partOf relationship has been identified
Thermosettings polymer (Article id= 321827)
Thermoplastic (Article id= 182444)
Elastomer (Article id = 842224)
3. Link and disambiguate each node to a Wikipedia article using Wikiminer (Milne et al., 2012)
Materials scienceElastomersPolymer physics
Polymer physicsPolymer chemistry
4. Process every group using (Ponzetto et al., 2007) taxonomy
5. Infer isA relationship in those groups that share a common ancestor
IntroductionAcquisition of Multilingual Terminology
Identification of Pedagogical Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
Evaluation
Gold Standard
• 57 document outlines in English from different domains.
• Human instructors defined the optimal output (LDOs).
• Each LDO restricted to the topics of the outline.
IntroductionAcquisition of Multilingual Terminology
Identification of Pedagogical Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
Results
• Heuristic Analysis
• Heuristic Analysis + Wikipedia Enhanced Process
partOf isA Total
Precision (%) 84.12 78.95 83.85
Recall (%) 98.66 21.20 83.85
partOf isA Total
Precision (%) 89.19 77.30 87.70
Recall (%) 96.49 50.53 87.70
IntroductionAcquisition of Multilingual Terminology
Identification of Pedagogical Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
Identification of Pedagogical Relationships: LiReWi
Mapping
Topics
Knowledge Bases
LiReWiElectronic Textbook
Candidate Relationship Extraction
Combination & Filtering
IntroductionAcquisition of Multilingual Terminology
Identification of Pedagogical Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
Mapping
Topic: SyntaxWikipedia id=3206060WordNet id=?
Comparer
Page Rank Disambiguation
SyntaxWordNet id= 6176322
SyntaxWordNet id= 8436203
Final id
Mapped WordNet idreturned =
WordNet id = 6176322
! =
Fernando’s Mappings
Babelnet MappingsWiki Id WordNet id3206060 8436203,…………. ………..……… …………
Wiki Id WordNet id3206060 6176322,…………. ………..……… …………
Mapping To WordNet Disambiguation
Disambiguation Context
WordNet id84362036176322……….
Java, Programming….
IntroductionAcquisition of Multilingual Terminology
Identification of Pedagogical Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
Identification of Pedagogical Relationships: LiReWi
MappingCandidate
Relationship Extraction
Topics
Knowledge Bases
LiReWiElectronic Textbook
Combination & Filtering
IntroductionAcquisition of Multilingual Terminology
Identification of Pedagogical Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
Candidate Relationship Extraction
WordNet Extractor
WibiExtractor
WikiRelations Extractor
Shallow Parsing Grammar Extractor
SequentialExtractor
NLP data
WikiTaxonomy Extractor
isApartOf
prerequisite
prerequisitepedagogically
-Close
isApartOf
isAisA isApartOf
Candidate Relationships
IntroductionAcquisition of Multilingual Terminology
Identification of Pedagogical Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
Candidate Relationship Extraction
Path Based Extractors:
Rocky planet
Mars
Planet
(path length=2,confidence=0.9)(path length=1,
confidence=1)
isAisA
WordNet Extractor
WibiExtractor
WikiRelations Extractor
Shallow Parsing Grammar Extractor
SequentialExtractor
WikiTaxonomy Extractor
IntroductionAcquisition of Multilingual Terminology
Identification of Pedagogical Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
Candidate Relationship Extraction
• WikiRelations: Set of tuples that state the relationships between Wikipedia categories.
T Tauri, Star, isA…………Radiation, Radio waves, partOfLight, Electromagnetic radiation, partOf…………Light, Electromagnetic radiation, partOf…………T Tauri star, Star, isA007 license to kill, video games, isA
WikiRelations Tuples
Light partOf Electromagnetic radiation (Confidence=0.7)
Topic: Light Cat1: Light Cat2: …
Topic: Electromagnetic radiation Cat1: Electromagnetic radiation
Topic: ……
WordNet Extractor
WibiExtractor
WikiRelations Extractor
Shallow Parsing Grammar Extractor
SequentialExtractor
WikiTaxonomy Extractor
IntroductionAcquisition of Multilingual Terminology
Identification of Pedagogical Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
Sentences with mentionsEarth is part of the Solar System.……………….
Candidate Relationship Extraction
• Extractor based on the rules defined in (Larrañaga, 2012).
TopicsSolar SystemEarthPlanetMars
Find Mentions Constraint Grammar applied to POS tags
RelationshipsEarth partOf Solar System……………….…………
GrammarTopic + [*]+ part of + [det] +Topic……………….
Textbook
WordNet Extractor
WibiExtractor
WikiRelations Extractor
Shallow Parsing Grammar Extractor
SequentialExtractor
WikiTaxonomy Extractor
IntroductionAcquisition of Multilingual Terminology
Identification of Pedagogical Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
WordNet Extractor
WibiExtractor
WikiRelations Extractor
Shallow Parsing Grammar Extractor
SequentialExtractor
WikiTaxonomy Extractor
Candidate Relationship Extraction
Textbook
TopicsWavelengthEmission spectrumPlanetSolar System
Find Mentions
Look links in/links out on
WikipediaReasoner
RelationsEmission spectrumpedagogicallyClose Wavelength…………………….
Possible candidates:Wavelength, Emission Spectrum
(2 times)
Sentences with mentions...leading to different radiated wavelengths, make up an emission spectrum. ... the emission spectrum of a particular star, the wavelength of ………………..
Relatedness > threshold
Emission spectrum (link out) WavelengthWavelength (link out) Emission spectrum
IntroductionAcquisition of Multilingual Terminology
Identification of Pedagogical Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
Candidate Relationship Extraction
Topic1 Topic2 Topic3 Topic4
Topic1 is pedagogicallyClose to Topic2 Topic3 is a prerequisite of Topic4
4
3
4
1
Mentions (Links):-Topic3, 4 mentions -….
Mentions (Links):-Topic4, 1 mentions -….
Mentions (Links):-Topic2, 3 mentions -….
Mentions (Links):-Topic1, 4 mentions -….
WordNet Extractor
WibiExtractor
WikiRelations Extractor
Shallow Parsing Grammar Extractor
SequentialExtractor
WikiTaxonomy Extractor
IntroductionAcquisition of Multilingual Terminology
Identification of Pedagogical Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
Identification of Pedagogical Relationships: LiReWi
MappingCandidate
Relationship Extraction
Combination & Filtering
Learning Domain Ontology
Topics
Knowledge Bases
LiReWiElectronic Textbook
IntroductionAcquisition of Multilingual Terminology
Identification of Pedagogical Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
Combination & Filtering Relationships
-Earth isA Planet (WordNet Ex) (Conf=1)-Earth isA Planet (WikiRelations Ex) (Conf=0.8)-Planet isA Earth (WikiTax Ex) (Conf=0.7)-Earth partOf Solar System (WordNet Ex) (Conf=1)-Earth isA Terrestrial Planet (WikiTax Ex) (Conf=0.5)
-Earth isA Planet (WordNet Ex, WikiRelations Ex) (Conf=1)
-Earth partOf Solar System (WordNet Ex) (Conf=1)
Relationships
-Earth isA Planet (WordNet Ex, WikiRelations Ex) (Conf=1)-Planet isA Earth (WikiTax Ex) (Conf=0.7)-Earth partOf Solar System (WordNet Ex) (Conf=1)
-Earth isA Planet (WordNet Ex, WikiRelations Ex) (Conf=1)-Earth partOf Solar System (WordNet Ex) (Conf=1)-Earth isA Terrestrial Planet (WikiTax Ex) (Conf=0.5)
Confidence Combiner
Conflict Resolver
Filter
Final Relationships
Conflict Resolution
Relationships combined
Filter below threshold
-Planet isA Earth (WikiTax Ex) (Conf=0.7)
-Earth isA Terrestrial Planet (WikiTax Ex) (Conf=0.5)
IntroductionAcquisition of Multilingual Terminology
Identification of Pedagogical Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
Evaluation
Tuning up
• Introduction to Object Oriented Programming textbook.
Evaluation • Gold Standard and Expert Validation.
• Introduction to Astronomy textbook.
• Gold standard, four experts stated the set of relationships.
• Using a subset of the main domain topics according to the score given by LiTeWi.
IntroductionAcquisition of Multilingual Terminology
Identification of Pedagogical Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
Results
Precision (%) Recall (%) F1-Score (%) ExpertValidation (%)
LiReWi 36.21 50.57 42.42 43.98
DOM-Sortze 63.27 20.74 31.24 N.A.
IntroductionAcquisition of Multilingual Terminology
Identification of Pedagogical
Relationships
Gathering Multilingual Learning Objects
Conclusions and Future WorkLiDom Builder
Outline
I. Introduction: Motivations and GoalsII. LiDom Builder: Building Multilingual Domain
ModulesIII. Acquisition of Multilingual TerminologyIV. Identification of Pedagogical RelationshipsV. Gathering Multilingual Learning ObjectsVI. Conclusions and Future Work
Gathering Multilingual Learning Objects
IntroductionAcquisition of Multilingual Terminology
Identification of Pedagogical
Relationships
Conclusions and Future WorkLiDom Builder
Introduction
50
In DOM-Sortze, LOs acquisition for Basque using Shallow Parsing.
A Validation of the approach for English has been carried out.
LiLoWi has been developed to move towards the elicitation of Multilingual LOs.
Gathering Multilingual Learning Objects
IntroductionAcquisition of Multilingual Terminology
Identification of Pedagogical
Relationships
Conclusions and Future WorkLiDom Builder
Adapting Learning Object elicitation to English
Basque English
Pattern adibidez, @topic for instance, @topic
Example Uretan, adibidez hidrogeno eta oxigeno atomoak daude.
For instance, there are hydrogen and oxygen atoms in water.
Textbook
TopicsWavelengthEmission spectrumEarth.Solar System Find
Mentions Grammar
Sentences with mentionsEarth is a planet.……………….
Learning Objects
The Moon is Earth's only natural satellite
IntroductionAcquisition of Multilingual Terminology
Identification of Pedagogical
Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
Evaluation
Gold Standard and Expert Validation:
• Evaluated on Introduction to Object Oriented Programming.
• Gold Standard built by some experts.
Two Aspects
• Grammar.• Learning Objects.
IntroductionAcquisition of Multilingual Terminology
Identification of Pedagogical
Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
Evaluation
Definitions Examples Prob. Stat. Princ. Stat. TotalFound 164 1 12 49 226
Correct 138 1 7 35 181
Precision (%) 84.15 100 58.33 71.43 80.09
Recall (%) ExpertValidation (%)
DOM-Sortze 70.31 91.88
LiDom 75.93 86.79
• Grammar
• Learning Objects
IntroductionAcquisition of Multilingual Terminology
Identification of Pedagogical
Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
LiLoWi
54
Metadata Generator
Multilingual LOs from WordNet/Wikipedia
TopicsSolar SystemEmission spectrumEarth. LO2es
LO1en
LO2en
Equivalents
IntroductionAcquisition of Multilingual Terminology
Identification of Pedagogical
Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
• Evaluated on the Principles of Object-Oriented Programming.
• Used the same LDO described in the previous experiment.
• Expert Validation.
Two Aspects
How LiLoWi enhanced the LO coverage for the LDO topics.
How many multilingual LOs are extracted.
Evaluation
55
IntroductionAcquisition of Multilingual Terminology
Identification of Pedagogical
Relationships
Gathering Learning Objects
Conclusions and Future WorkLiDom Builder
Results
56
Definitions ReferencesEnglish Spanish Basque French
Number of topicsTopic coverage (%)
4656.10
3643.90
910.97
3643.90
1214.63
• Grammar + Wikipedia/WordNet
Total Definitions
Number of topics 21 19
Topics coverage (%) 25.61 19.51
• Grammar-based approach
IntroductionAcquisition of Multilingual Terminology
Identification of Pedagogical
Relationships
Gathering Learning Objects
Conclusions and Future Work
LiDom Builder
I. Introduction: Motivation and GoalsII. LiDom Builder: Building Multilingual Domain
ModulesIII. Acquisition of Multilingual TerminologyIV. Identification of Pedagogical RelationshipsV. Gathering Multilingual Learning ObjectsVI. Conclusions and Future Work
Outline
57
1. Provision of a suitable formalism to represent Multilingual Domain Modules.
2. Developed a method for the elicitation of multilingual terminology.– First term extractor to our knowledge based on searching patterns for
educational content.
3. Relationship Acquisition has been improved.– Extension of outline processor to English + Enhancement with Wikipedia.– Development of LiReWi, a module for the elicitation of pedagogical
relationships for Educational Ontologies.– Developed a state of the art mapper from Wikipedia to WordNet.
4. Developed a method for multilingual LO generation. – Extension of DOM-Sortze for English.– Development of LiLoWi, a module for the elicitation of multilingual LOs using
different knowledge bases.
IntroductionAcquisition of Multilingual Terminology
Identification of Pedagogical
Relationships
Gathering Learning Objects
Conclusions and Future Work
LiDom Builder
Goal Achievement
Conclusions and Future Work
• Automatising the inclusion of new languages.
• Multilingual Learning Object generation from similarity and machine translation techniques.
• Concept Map-Based Learning Object Generation.
• Improvements on each module of LiDom Builder.
Future Work
IntroductionAcquisition of Multilingual Terminology
Identification of Pedagogical
Relationships
Gathering Learning ObjectsLiDom Builder
Conclusions and Future Work
Software Released
Software
• LiTeWi, released with Spanish/English support: https://github.com/Neuw84/LiTe
• Wikipedia/WordNet mapper: https://github.com/Neuw84/Wikipedia2WordNet
• Spanish stemmer: https://github.com/Neuw84/SpanishInflectorStemmer
• Training Data for Wikiminer: https://github.com/Neuw84/Wikipedia353Spanish
• LiReWi: coming soon….
Web Demo
• LiDom builder : http://galan.ehu.es/lidom/
IntroductionAcquisition of Multilingual Terminology
Identification of Pedagogical
Relationships
Gathering Learning ObjectsLiDom Builder
IntroductionAcquisition of Multilingual Terminology
Identification of Pedagogical
Relationships
Gathering Learning Objects
Conclusions and Future Work
LiDom Builder
Publications
A Combined Approach for Eliciting Relationships for Educational Ontologies Using Several Knowledge Bases. Ángel Conde, Mikel Larrañaga, Ana Arruarte, Jon A. Elorriaga. Journal of Knowledge-Based Systems. Submitted.
LiteWi: A Combined Term Extraction Method for Eliciting Educational Ontologies from Textbooks.Ángel Conde, Mikel Larrañaga, Ana Arruarte, Jon A. Elorriaga, Dan Roth. Journal of the Association for Information Science and Technology, 67(2), pp. 380–399, 2016.
Testing Language Independence in the Semiautomatic Construction of Educational Ontologies. Ángel Conde, Mikel Larrañaga, Ana Arruarte, Jon A. Elorriaga. 12th International Conference on Intelligent Tutoring Systems ITS 2014, Springer, Vol. 8474, pp. 545-550, 2014.
Automatic Generation of the Domain Module from Electronic Textbooks. Method and Validation. Mikel Larrañaga, Ángel Conde, Iñaki Calvo, Jon A. Elorriaga, Ana ArruarteIEEE Transactions on Knowledge and Data Engineering, 26(1), pp. 69-82, 2014.
Automating the Authoring of Learning Material in Computer Engineering Education.Ángel Conde, Mikel Larrañaga, Iñaki Calvo, Jon A. Elorriaga, Ana Arruarte. 42nd Frontiers in Education Conference, pp. 1376-1381, 2012.
LiDom Builder: Automatising the Construction of Multilingual Domain
Ángel Conde ManjónGaLan Research Group – LSI department, University of the Basque
Country (UPV/EHU)
Supervisors:Mikel Larrañaga Olagaray & Ana Arruarte Lasa
UPV/EHU