Dynamic Content Discovery, Harvesting and Delivery,from Open Corpus Sources, for Adaptive Systems
Séamus Lawless
Knowledge & Data Engineering Group, Trinity College Dublin
For more information visit: www.cs.tcd.ie/~slawless
Methodology
Personalised elearning is being heralded as one of the grand challenges of next generation learning systems, in particular, its ability to support greater effectiveness, efficiency and student empowerment. However, a key problem with such systems is their reliance on bespoke content. The challenge for adaptive systems in scalably supporting personalised elearning is its ability to discover, harvest and deliver open corpus content to adaptive content services and personalised elearning systems.
The Problem
Research Goal and ScopeThe goal of this research is to provide dynamic contextual retrieval of open corpus content for eLearning systems. There are many sources of open corpus content, the primary examples of which are the Worldwide Web and digital content repositories. Open corpus content lacks consistency in its structuring. These inconsistencies are both semantic, relating to the vocabulary used to describe the content, and syntactic, relating to the metadata standards implemented. Metadata mappings to a canonical model using a fixed vocabulary ontology will be implemented to ensure consistency and accuracy in content discovery, analysis and description. Lexical analysis and metadata tag generation will be required for some open corpus content. The aesthetic qualities and presentation of retrieved content and IP/DRM are both deemed to be outside the scope of this research.
It is proposed that a series of content requirements can be extracted from educational environments. These requirements are both technical, relating to content structure, metadata standards and taxonomies used, and semantic, relating to subject matter and pedagogical needs. A cache of candidate learning content is generated through a combination of web crawling and OAI harvesting of repositories. This content is then analysed and indexed to enable discovery. Upon creation of the content index a mapping will occur to a canonical metadata model using a fixed ontology of terms. Metadata tags will be created for content with insufficient descriptions. A query engine will take the content requirements and generate queries to run against the content index and identify suitable candidate content. Once suitable content is identified it will be harvested and delivered to a learning object generation environment for restructuring.
Status Work underway
Focused Web Crawling System operational
Initial attempts at generating an index of the content cache
Soon to commence
Initial attempt at requirements extraction from Learning Designs
Metadata canonical model & vocabulary ontology creation
The research is supported by IRCSET’s embark initiative and conducted in conjunction with the IBM Centre for Advanced Studies Student Program.
System Architecture
Content Cache
Digital Repositories
WWW
Analysis & Annotation
Ontologial Mapping
Metadata Model
Mapping
Learning Object
Generation
Query Engine
AHS / Personalised
eLearning Environment
Content Index
Content Harvesting
Open Corpus Content Service
Content Requirements
NDLR
User Input
User Model
Heritrix Web Crawler
Rainbow Text Classifier
OCCS Focused Crawling System
OAI Content Harvester
Content Indexing
Learning Design