[ieee 2008 ieee international workshop on semantic computing and applications (iwsca) - incheon,...

6
Domain-based Recommendation and Retrieval of Relevant Materials in E-learning Aijuan Dong Baoying Wang Department of Computer Science Waynesburg University Frederick, MD 21704 Waynesburg, PA 15370 [email protected] [email protected] Abstract A good e-learning system should deliver relevant learning materials to learner at the most appropriate time and locations to facilitate learners’ acquisition of knowledge and skills. In this paper, we propose domain-based recommendation and retrieval of relevant materials in e- learning. Since relevancy is often domain dependent. Materials that are highly related in one domain might be irrelevant in another domain, we group users by their domains. Based on the content, multiple sets of relevant documents, one for each chosen domain, are prepared. In a search scenario, search sites are selected by domain and search is performed on the chosen relevant sites only. To demonstrate our idea, we use Virtual Conference on Genomic and Bioinformatics as the test bed and develop a presentation video access platform. The implementation involves techniques in image and video processing, database management, programming, and multimedia learning materials presentation. Introduction The emergence of the Internet and the recent development in multimedia technology have had great impact on education. E-learning, broadly refers to online training and learning, are becoming more and more prevalent. By 2003, 84 percent of US colleges have e-learning programs [1]. In addition, there are e-learning systems for military, medical, and cooperate trainings [2][3]. The demand for e- learning continues to grow. eMarketer's e-Learning Report predicted that the e-learning market could reach $50 billion by the end of the decade[4]. E- learning systems enhance learning experiences and augment teachers’ work in and out of traditional classrooms [5, 6]. Working professionals as well embrace e-learning programs due to their convenience and flexibility [1]. In an e-learning system, significant differences exist among learners and researchers, such as learning habits, personal interests, and professional backgrounds. One of the main objectives of any e- learning system is to provide personalized content that meets a user’s learning need. The task of presenting personalized content is often framed in terms of a recommendation task where system recommends pertinent materials to users [7]. In this paper, we are particularly interested in the recommendation and retrieval of relevant materials in e-learning systems. A good e-learning system should deliver relevant information to users at the most appropriate time and location. For example, if the subject of a video is about mining of large scale gene expression databases and the user is a biologist, it would be very beneficial to list a video that shows the microarray process for gene expression data as one related link so that interested users can get this closely-related information instantly. It has been proved that relevant information can advance the understanding of complex problems and the acquisition of knowledge and skills [8]. Therefore, e- learning systems should retrieve and deliver materials in alignment with users’ personal needs to improve their learning experiences. Historically, relevant materials are manually prepared by content providers. Research of automated retrieval of educational relevant material has been focused on computer simulation [9, 10]. The parameters of the simulation environment are used as input to the information retrieval. Rather than using simulation environments, paper [7] uses annotated video as an information retrieval interface. These approaches work well for traditional classrooms where learners have similar backgrounds. To deliver personalized content to users with diverse backgrounds, data mining techniques have been used in e-learning systems in recent years [11, 12]. The data mining approach uses all the available information about existing users, such as system logs, to learn user models and then use these models for personalization. In this paper, we explore a different alternative for recommending and retrieving of relevant materials. Rather than clustering users by their IEEE International Workshop on Semantic Computing and Applications 978-0-7695-3317-9/08 $25.00 © 2008 IEEE DOI 10.1109/IWSCA.2008.29 103

Upload: baoying

Post on 11-Mar-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [IEEE 2008 IEEE International Workshop on Semantic Computing and Applications (IWSCA) - Incheon, Korea (South) (2008.07.10-2008.07.11)] 2008 IEEE International Workshop on Semantic

Domain-based Recommendation and Retrieval of Relevant Materials in E-learning

Aijuan Dong Baoying Wang Department of Computer Science Waynesburg University Frederick, MD 21704 Waynesburg, PA 15370 [email protected] [email protected]

Abstract

A good e-learning system should deliver relevant

learning materials to learner at the most appropriate time and locations to facilitate learners’ acquisition of knowledge and skills. In this paper, we propose domain-based recommendation and retrieval of relevant materials in e-learning. Since relevancy is often domain dependent. Materials that are highly related in one domain might be irrelevant in another domain, we group users by their domains. Based on the content, multiple sets of relevant documents, one for each chosen domain, are prepared. In a search scenario, search sites are selected by domain and search is performed on the chosen relevant sites only. To demonstrate our idea, we use Virtual Conference on Genomic and Bioinformatics as the test bed and develop a presentation video access platform. The implementation involves techniques in image and video processing, database management, programming, and multimedia learning materials presentation.

Introduction

The emergence of the Internet and the recent development in multimedia technology have had great impact on education. E-learning, broadly refers to online training and learning, are becoming more and more prevalent. By 2003, 84 percent of US colleges have e-learning programs [1]. In addition, there are e-learning systems for military, medical, and cooperate trainings [2][3]. The demand for e-learning continues to grow. eMarketer's e-Learning Report predicted that the e-learning market could reach $50 billion by the end of the decade[4]. E-learning systems enhance learning experiences and augment teachers’ work in and out of traditional classrooms [5, 6]. Working professionals as well embrace e-learning programs due to their convenience and flexibility [1].

In an e-learning system, significant differences exist among learners and researchers, such as learning habits, personal interests, and professional backgrounds. One of the main objectives of any e-

learning system is to provide personalized content that meets a user’s learning need. The task of presenting personalized content is often framed in terms of a recommendation task where system recommends pertinent materials to users [7]. In this paper, we are particularly interested in the recommendation and retrieval of relevant materials in e-learning systems. A good e-learning system should deliver relevant information to users at the most appropriate time and location. For example, if the subject of a video is about mining of large scale gene expression databases and the user is a biologist, it would be very beneficial to list a video that shows the microarray process for gene expression data as one related link so that interested users can get this closely-related information instantly. It has been proved that relevant information can advance the understanding of complex problems and the acquisition of knowledge and skills [8]. Therefore, e-learning systems should retrieve and deliver materials in alignment with users’ personal needs to improve their learning experiences.

Historically, relevant materials are manually prepared by content providers. Research of automated retrieval of educational relevant material has been focused on computer simulation [9, 10]. The parameters of the simulation environment are used as input to the information retrieval. Rather than using simulation environments, paper [7] uses annotated video as an information retrieval interface. These approaches work well for traditional classrooms where learners have similar backgrounds. To deliver personalized content to users with diverse backgrounds, data mining techniques have been used in e-learning systems in recent years [11, 12]. The data mining approach uses all the available information about existing users, such as system logs, to learn user models and then use these models for personalization.

In this paper, we explore a different alternative for recommending and retrieving of relevant materials. Rather than clustering users by their

IEEE International Workshop on Semantic Computing and Applications

978-0-7695-3317-9/08 $25.00 © 2008 IEEE

DOI 10.1109/IWSCA.2008.29

103

Page 2: [IEEE 2008 IEEE International Workshop on Semantic Computing and Applications (IWSCA) - Incheon, Korea (South) (2008.07.10-2008.07.11)] 2008 IEEE International Workshop on Semantic

browsing history, we group users by their domains. Based on the content, multiple sets of relevant documents, one for each chosen domain, are prepared. In a search scenario, search sites are selected by domain. The rationale is that relevancy is usually domain dependent. Materials that are highly related in one domain might be irrelevant in another domain. For the same learning material, such as a conference presentation video or an image or even a text document, biologists will be presented with biology-related documents and search biology-related resources, while computer scientists will be presented with computer science related materials and search computer science-related resources. Since users from the same domain often have the same backgrounds and share the same interests, there is a potential to improve users’ learning experience.

Among the myriad types of e-learning materials, presentation videos from lectures, conferences and seminars and corporate trainings are of particular interest to the research reported here. The need for specific solutions in this field comes from the popularity of e-learning systems and the vital role presentation videos playing in e-learning systems.

The rest of the paper is organized as follows: in Section 2, we investigate the desired characteristics of recommender systems. Based on the study, Section 3 describes the architecture of the proposed approach. To demonstrate the idea, a prototype system is developed in Section 4. Section 5 concludes the paper and proposed some future work.

2. Characteristics of a good recommender system

To design an effective recommendation system for e-learning, it is important to understand what users’ needs are, i.e., what characteristics are desired in a recommender system? The discussions below are applicable to any recommendation system, but focus on those in educational context. These characteristics serve as guidelines for our framework design and platform implementation. A good recommender should:

Provide personalized view. A good recommender system should present users with materials of their interests. In E-commerce, products are offered based on customer’s buying interests. Likewise, in E-learning, relevant documents should be chosen and presented to learners or researchers based on the interests and backgrounds.

Provide customized search. Advances in computing power, network bandwidth, and information storage have led to a proliferation of data. The availability of this huge and diverse data collection is not necessarily an advantage. Larger collection also means that it is more difficult to find the most relevant

information among all those related to the same subject. To help learners and researchers find what they need, search should be performed on selected, relevant recourses.

Recommend materials at the appropriate time or location. In a long textual document, in-text citations immediately follow a source or a paraphrase of a source’s ideas to indicate the exact spot where a reference is relevant. Similarly, in a presentation video, as the video progresses or moves into a new segment, relevant documents should be updated accordingly, displaying relevant documents that are only related to the video segment that is currently playing.

Support non-disruptive view experience. Non-disruptive means users have the option to either pursue or ignore relevant materials based on their learning needs. In the case of presentation videos, links to pertinent materials are presented and updated automatically as a presentation video progress. Users have the option to either pause or play the source video when relevant documents are further pursued.

3. The architecture

The proposed system consists of several major components: video segment generator, video indexer, relevant document generator, and search/browse engine. First, video segment generator takes a continuous presentation video stream and generates a sequence of segments. Then, video indexer takes the video segments and extracts annotations from each segment. Domain ontologies are integrated in this process so that domain-specific annotations can be extracted for each involving domain. For example, if the subject of a video is mining of large scale gene expression databases, then both biologists and computer scientists might be interested in this video. Two sets of annotation data will be extracted. One integrates Gene Ontology(http://www.geneontology.org/) and describes the video content from biologists’ view. The other one integrates Data Mining Ontology [13] and describes the video content form computer scientists’ view. The video segmentation and multi-ontology based annotation extraction process are described in our previous paper [13].

After segmenting and indexing, relevant document generator uses the annotations that are assigned to video segments to search various resources and find pertinent materials to each segment. As described above, if multiple domains are involved in a presentation video, multiple sets of annotations will be generated. For each set of annotations, resources pertinent to that specific domain are searched for relevant documents. Continue the example above, to generate relevant

104

Page 3: [IEEE 2008 IEEE International Workshop on Semantic Computing and Applications (IWSCA) - Incheon, Korea (South) (2008.07.10-2008.07.11)] 2008 IEEE International Workshop on Semantic

materials for biologists, GO-based annotations are used to search two biology-related sources, PubMed (http://www.ncbi.nlm.nih.gov/PubMed/) and The European Bioinformatics Institute (EBI) data sources (http://www.ebi.ac.uk/). As can be seen, relevant document generator extracts domain-specific relevant materials. For the same presentation video, different domains have different sets of relevant materials.

Video segmenting, video indexing, and relevant material generating are performed offline. For online browse, the user chooses the domain, and then domain-specific view is displayed. For online search, two kinds of searches are provided: internal search and external search. Internal search works on internal data collection, such as all presentation videos from a conference, which has been described in our previous work [13]. External search performs

on publicly available data sources and search sites are restricted to those pertinent to the selected domain. For example, for biologists, PubMed will be searched; for computer scientist, CiteSeer (http://citeseer.ist.psu.edu/?form=citesearch) will be searched. By restricting sites penitent to the chosen domain, there is a potential to increase retrieval relevancy. By integrated user profiles, the chance of presenting materials of more interests to users can be increased.

3. The prototype system

To demonstrate domain-based recommendation and retrieval of relevant materials, we use Virtual Conference on Genomics and Bioinformatics VCGB) (http://www.ndsu.edu/virtual-genomics/conference_2003.htm) as our test bed and build a domain-based presentation video access

Figure 1: Domain-based recommendation and retrieval of relevant documents

Ontology

Relevant Data Set 1

Relevant Data Set 1

Offline Data Preparation

Online Access

User Profile

Search / Browse Engine

Choosing Domain

Segmenting Video

Annotation Data Set n

Video Production

Data

Indexing Video

Generating Relevant Materials

Annotation Data Set 1

105

Page 4: [IEEE 2008 IEEE International Workshop on Semantic Computing and Applications (IWSCA) - Incheon, Korea (South) (2008.07.10-2008.07.11)] 2008 IEEE International Workshop on Semantic

platform. In the following sub sections, we first describe the data preparation steps, i.e. video segmentation, video indexing, and relevant materials generating, and then elaborate the design and development of browse and search interfaces.

3.1. Data preparation

Three types of multimedia objects are utilized in video segmentation: presenter video streams, slide video streams and PowerPoint slide files. For each VCGB presentation, slide video stream is first segmented using Matlab (http://www.mathworks.com/) scripts based on global color histogram. Since presenter video stream and PowerPoint slide video stream have the same presentation timeline, presenter video stream is segmented as well based on this temporal relationship. After that, text on each PowerPoint slide is extracted using Shyam’s Toolbox (http://skp.mvps.org/toolbox/) and domain-specific annotation are then extracted from extracted slide text. Due to the nature of the conference, two domains, i.e. biology and computer science, are involved in our system. Therefore, two domain onotogies are employed: Gene Ontology(GO) and Data Mining Ontology (DMO), which generates two sets of annotation data, GO-based annotation and DMO-based annotation. The details of annotation extraction process can be found

here [13]. Since the key frame of each video segment matches one PowerPoint slide, these extracted annotations can be seen as semantic annotation of corresponding video segments. Besides domain-specific annotation, other annotations, such as presentation titles, presentation durations, video segment start time and end time, video segment key frames, and so on, are all stored in database.

With domain-specific annotations, two sets of relevant documents are extracted. For each video segment, GO-based annotation data are used to search Google web sources and PubMed literature databases, which generates biology-related relevant document set; while DMO-based annotation data are used to search Google web sources and CiteSeer, literature databases, which produces computer science-related relevant materials.

3.2. The browsing interface

In order to explore the delivery of relevant information in the context of watching a presentation video, an interface is designed that links the presenter video, PowerPoint slides and relevant materials, accessible using RealPlayer from World Wide Web browsers. A screen shot of this interface is given in Figure 2.

B C

D

F

E

A

Figure 2: Domain-based browsing and searching interface

106

Page 5: [IEEE 2008 IEEE International Workshop on Semantic Computing and Applications (IWSCA) - Incheon, Korea (South) (2008.07.10-2008.07.11)] 2008 IEEE International Workshop on Semantic

The technologies employed here include Synchronized Multimedia Integration Language (SMIL 2.0) [14], RealPix[15], and RealText[15]. SMIL is W3C recommendation designed for web-based multimedia presentations which combines audio, video, text and graphics in real-time. It has full support for timing, linking, layout and animation and is suitable to generate integrated multimedia learning materials. RealPix and RealText from Real Networks provide additional formatting and transition effects for graphics and text beyond what SMIL markup can achieve.

A. Title pane: displays the presentation title and presenter name.

B. Slide pane: displays PowerPoint slides so that users can read key points of a presentation. RealPix is used to create slideshows from extracted slide images that streams in with the corresponding presenter video.

C. Index pane: provides random access points using PowerPoint slide titles. Slide titles are used as hyperlinks. When clicked, presenter video segment for that particular slide will be played and the corresponding slide will be also shown in the Slide pane. RealText creates timed text and is used here to synchronize the Slide pane and the Presenter pane.

D. Presenter pane: displays the presenter and increases the interaction feeling.

E. Switchboard pane: allows users to switch among different views and initiate search. Two views are provided in our experiment: Biologist View and Computer Scientist View. If the Biologist View is chosen, then relevant documents for biologists will be

automatically displayed for the video segment that is currently playing. If the Computer Scientist View is chosen, then relevant documents for computer scientists will be presented instead. Users can pause or continue watching the video while browsing related documents. In addition, two types of searches are supported: Search the VCGB, which is an internal search, and Search, which is an external search. When these search links are clicked, search will be performed in the Browser pane. This Switchboard pane is realized using SMIL timing and linking features.

F. Browser pane: supports any content playable in Internet Explorer. Relevant information, such as a HTML page, an audio or video clip, can be played in this pane. Moreover, users can search and perform other activities relevant or irrelevant to the current presentation with this pane. 3.3 The search

To help learners and researchers find what they need from publicly available data sources, the external search is performed on restricted, relevant recourses. When the Search link is clicked, different set of recourses or web sites will be searched depending on the current view that a user is in. For example, if a user is a biologist, only biology-related recourses are searched. Figure 3 shows one screen shot of an external search result. Due to limited space, the top RealPlayer section, including the Title pane, then Index pane, the Slide pane, and the Switchboard pane, is not shown.

Figure 3: The domain-based external search

107

Page 6: [IEEE 2008 IEEE International Workshop on Semantic Computing and Applications (IWSCA) - Incheon, Korea (South) (2008.07.10-2008.07.11)] 2008 IEEE International Workshop on Semantic

In this particular example, three bioinformatics- related web sites are searched: the Bio-computing.org, the European Bioinformatics Institute (EMBL-EBI), and the PubMed of the U.S. National Library of Medicine. Custom labels associate search results with each web site. Search results may be expanded or collapsed. By restricting search to pertinent sites, there is a potential to increase the relevancy of search results. 4. Conclusions and future work

In this paper, we propose domain-based

recommendation and retrieval of relevant materials in e-learning. Instead of grouping users by their browsing histories, we group users by domains. For the same learning material, multiple sets of relevant materials are prepared, one for each chosen domain. To increase the degree of retrieval relevancy, search sites are chosen by domain and search is restricted to those relevant sites only. To demonstrate our idea, we implement a presentation video access platform, using Virtual Conference on Genomic and Bioinformatics as our test bed.

In the future, we will carry out usage studies to investigate the effectiveness of the platform. In addition, as mention in Section 1, data mining techniques have been used in education for recommending learning materials. We are interested in integrating domain-based recommendation and retrieval with data mining approach to further enhance the learning experience. 5. References [1] S. Kariya, “Online education expands and evolves,” IEEE Spectrum. 40(5), 2003, pp.49-51. [2]. T. Smith, A. Ruocco and B. Jansen, ‘‘Digital video in education,’’ Proc. of the Thirtieth SIGCSE Technical Symposium on Computer Science Education, pp. 122-126. [3]. J. Fan, H. Luo and A.K. Elmagarmid, ‘‘Concept-oriented indexing of video databases: Toward semantic sensitive retrieval and browsing,’’ IEEE Trans. On Image Processing, July 2004, pp. 974---992. [4]. eMarketer, ‘‘E-Learning Gains Momentum,’’ from http://www.emarketer.com/Article.aspx? id=1002352. [5]. G. D. Abowd, J. A. Brotherton and J., Bhalodai, ‘‘Classroom 2000: A system for capturing and accessing multimedia classroom experiences,’’ CHI '98 Demonstration Paper, pp.20-21.

[6]. J. Flachsbart, D. Franklin and K., Hammond, ‘‘Improving human computer interaction in a classroom environment using comptuer vision,’’ Proc. of the 5th international conference on Intelligent user interfaces, pp. 86-93. [7]. B. Mobasher, “Data Mining for Personalization,” In The Adaptive Web: Methods and Strategies of Web Personalization, Brusilovsky, P., Kobsa, A., Nejdl, W. (eds.). Springer-Verlag, Berlin Heidelberg (2007) 1-46. [8]. A. S. Gordon, “Using Annotated Video as an Information Retrieval Interface,” International conference on intelligent user interface, New Orleans, LA, 2000, 133-140. [9] K. Forbus. and P. Whalley, “Using qualitative physics to build artificial software for thermodynamics educations”, Proceedings of AAAI-94, Seattle, WA, 1994. [10] T. Maurray. “Authoring intelligent Tutoring systems: an analysis of the state of the art”, International Journal of Artificial Intelligence in Education, 10, 100-133. [11]. C. Romero, S. Ventura, “Educational Data Mining: a Survey from 1995 to 2005,” Expert Systems with Applications. Elsevier 1:33 (2007) 135-146. [12]. C. Romero, S. Ventura, J. A. Delgado and P. D. Bra, “Personalized Links Recommendation Based on Data Mining in Adaptive Educational Hypermedia Systems,” in Creating New Learning Experiences on a Global Scale, Springer Berlin / Heidelberg(2007), pp.292-306. [13]. A. Dong and H. Li, “Multi-ontology Based Multimedia Annotation for Domain-specific Information Retrieval,” The IEEE International Workshop on Multimedia Technology and Ubiquitous Computing, Taichung, Taiwan, June, 2006, pp.158-165. [14]. SMIL, from http://www.w3.org/AudioVideo/, 2007. [15]. RealNetworks Production Guide, from http://service.real.com/help/library/guides/realone/ProductionGuide/HTML/, 2007.

108