[acm press the 2011 workshop - palo alto, california (2011.02.13-2011.02.13)] proceedings of the...

OSUSUME: Cross-lingual Recommender System for Research Papers

Kiyoko Uchiyama National Institute of Informatics

2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo, JAPAN [email protected]

Akiko Aizawa National Institute of Informatics

2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo, JAPAN [email protected]

Hidetsugu Nanba Hiroshima City University

3-4-1 Ozuka-higashi, Asaminami-ku, Hiroshima [email protected]

Takeshi Sagara Picolab, co., LTD

1-4-6 Kita-Aoyama, Minato-ku, Tokyo, JAPAN [email protected]

ABSTRACT In this paper, we introduce a cross-lingual recommender system for research papers based on multiple-facets in Jap-anese. The system is the first Japanese research paper re-commender system and recommends international papers simply by typing Japanese keywords. Academic search en-gine users vary from undergraduate students to researchers. As the users have different backgrounds and preferences, they will aim at different kind of papers. In order to consid-er the various users’ preferences, the system introduces eight viewpoints: internationality, similarity, state of the art, serendipity, contextual analysis (target and method), essen-tiality and authority. The recommendation results of our system were evaluated by a questionnaire survey. Based on the survey results, we discuss the relevance of the proposed viewpoints and the precision of the system.��

Author Keywords Recommender System in Japanese, Multiple Facets, Cross-Lingual Paper Search

ACM Classification Keywords H.3.1 Content Analysis and Indexing, I.2.7 Natural Language Processing,

INTRODUCTION A large amount of scholarly papers are published and the number of papers published is increasing every year. It is crucial to be able to find appropriate articles effectively. Academic search engine users range from undergraduate students who have not decided on their research topics to researchers transitioning to work in a new field. The re-trieval purpose depends upon the individuals. Experienced researchers as experts need to find similar papers or new approaches or methods regarding their researches. On the other hand, students as novices explore interesting research themes and topics, and they need to find reliable/authorized and basic/essential papers quickly. The existing recom-mender systems have not sufficiently responded to the va-riety of searching purposes.

We propose a cross-lingual recommender system based on multiple facets for research papers. Our system employs eight viewpoints on recommendation: internationality, simi-larity, state of the art, serendipity, contextual analysis of target, contextual analysis of method, essentiality and au-thority.

This paper is structured as follows. We discuss related work with a focus on research paper recommender systems in the next section. We then explain the details of the eight view-points and the graphical user interface of our system in Sec-tion 3. We discuss our system based on a questionnaire sur-vey in Section 4. Finally, Section 5 concludes the paper and describes possible future work.

RELATED WORK Recommender systems apply techniques that predict and propose items that are likely to be of interest to the users. There are two main approaches to finding similar items: collaborative filtering and content-based filtering. The col-laborative filtering method relies on using data from a huge amount of sources to develop profiles of users with similar preferences. The collaborative filtering needs the users’ profiles, their rating or implicit behavior (read/unread, viewed/not viewed, etc) for an item in order to find users who share similar rating or behavior patterns. The content-based filtering method recommends items to a user if these items are similar in content to items the user has preferred in the past. The content-based method uses features of items.

The user’s previous rating is an important factor in the col-laborative filtering approach. The “cold-start problem” oc-curs for new users with no previous preferences: the lack of rating or user’s behavior information prevents the system to draw any inferences about the users’ needs and preferences. CiteULike1 is a free service to help users to store, organize and share scholarly papers. Users register the references for their research directly and add the priority, tags and com-ments on the papers. Bogers et al. [1] proposed a method which used all information on which uses the user’s refer-ence library to generate their recommendations. They found

1 http://www.citeulike.org/faq/faq.adp

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. CaRR 2011, February 13, 2011, Stanford, CA, USA. Copyright © 2011 ACM 978-1-4503-0625-6/11/02... $10.00

39

that “the cold-start problem” disappeared after the first two years by a temporal analysis of CiteULike. McNee et al. [2] explored the use of collaborative filtering using the citation web to find connections between papers and create the rat-ings matrix.

The collaborative filtering method also extracts information from users’ profiles. Sugiyama and Kan [3] constructed a model which considers the past works of a researcher in recommending scholarly papers to the researcher. They differentiate between junior researchers that have published few papers and senior researchers that have multiple publi-cations.�

Hybrid methods that combine collaborative filtering and content-based filtering have been recently proposed in order to cover the shortcomings of each method. Several tech-niques combining content-based and collaborative filtering-based algorithms for recommending research papers [4] and for building research reading lists have been implemented and tested [5].

Our system employs a hybrid approach; calculating similar-ity between papers based on the input keyword as the con-tent-based approach and utilizing the user logging function based on the past user’s publications as the collaborative approach.

USER INTERFACE OSUSUME is designed to support the users by a user-friendly GUI. The operation for getting recommendation is shown in Figure 1. The search result view of each step is merged in Figure�2. The numbers in Figure 2 correspond to the operation procedures listed in Figure 1.

Target Resources We used 4 million bibliographic records in CiNii (NII Scholarly and Academic Information Navigator). CiNii is a database service that enables searching of information on academic articles published in academic society journals or university research bulletins in Japan. The titles, authors, abstracts and keywords in the bibliographic data were processed by the Japanese part-of-speech and morphological analyzer Mecab 2 with the use of the dictionary UniDic 3 . Nouns are extracted from the

2 http://mecab.sourceforge.net/ 3 http://mecab/sourceforge.net/

segmented text and indexed as keywords.

Selecting a seed paper At first, we need to select a seed paper manually for receiv-ing recommendation. Our system searchs or appropriate papers and builds a recommendation list based on a seed paper in terms of each viewpoint. We implemented two methods for selecting a seed paper, an author-based method and a keyword-based method. The system is using the hybr-id approach that combines collaborative filtering as the au-thor-based method and content-based filtering as the key-word-based method.

When the users enter their own (author) name or the name of another researcher, the system outputs three lists; (1) a list of the author’s/researcher’s past published papers, (2) a list of papers similar to the author’s/researcher’s published papers and (3) a list of author’s/researcher’s conducting similar research. To generate the list of similar papers, keyword vectors are constructed using characteristic words in the author’s/researcher’s past publications. The similarity is calculated between the papers and the output result dis-plays five papers randomly chosen from the top 100 similar papers. The reason why we randomly select from the output result instead of choosing the top 5 is to show a wide varie-ty of papers. Similarly, the system shows five similar re-searchers who published similar papers with au-thor’s/researcher’s papers by calculating keyword vectors as above.

In the case of users who have published one/no paper such as undergraduate students, they can input their interesting keywords directly. The similar papers list (4) with key-words is displayed by similarity order. The users select a seed paper out of the listed papers as the result of above operation. Once a seed paper is selected, the system builds recommendation list in terms of each eight viewpoint.

MULTIPLE FACETS We categorize two types of target users before defining the eight viewpoints: novice and expert users. Novices such as undergraduate students having insufficient knowledge of a research field tend to retrieve state-of-the-art literature, es-sential and authority papers in order to grasp the outline of a research topic. On the other hand, experts such as

Figure1: Operation procedure

40

researchers with an extensive list of publications search for detailed information like similar methods or targets in their research field. Usually, experts look in the same area, but there might be some interesting papers in different fields. Considering these types of target users, we prepared eight viewpoints that we will discuss below.

International Paper The titles, abstracts and keywords of Japanese scientific papers are described both in Japanese and English. We use the English keywords to search for international papers. The recommendation procedure is as follows: (a) input key-words in Japanese, (b) search Japanese papers which have the keywords, (c) extract English keywords from the matched Japanese papers, (d) find English papers with the English keywords and display the result.

International research papers in English are stored in NII-REO (NII Repository of Electronic Journals and Online Publications). NII-REO integrates the contents of electronic journals in accordance with licensing agreements with par-ticipating publishers. This international paper recommenda-tion helps users to get an overview of international trends by simply entering Japanese keywords. This viewpoint is useful for finding English papers, especially for novices who do not know the appropriate keywords in English.

Similar paper The similarity is measured as follows: (a) extract characte-ristic words which equate to nouns in UniDic from the title and the abstract, (b) using the characteristic words, build a vector based on TF-IDF approach, (c) calculate the cosine similarity between the vectors, (d) display five papers on the screen out of the top 200 similar papers by their scoring order. Researchers should always check similar works and trends with their own works. This viewpoint enables to ef-

fectively find similar papers with the users’ preference or the users’ past work.

State-of-the-Art Paper The state-of-the-art papers are sorted by up-to-date out of the top 200 similar papers. This viewpoint can provide the hot topics or the latest techniques related to the users’ works or keywords.

Serendipitous Paper The serendipitous viewpoint is used to find similar papers in different fields from the users’ field. We define a field with the journals in which the users and co-authors have published in the past. This viewpoint can find interesting papers from different journals not necessarily known to the users. It is thus possible to find novel methods or targets which have been studied in different fields as serendipitous papers.

Same Target Paper The aim of the same target viewpoint is to show papers with the same target or purpose as the one of a given seed paper. We use the technical trend analysis method that was proposed in [6], and prepare 23 phrases as key expressions (TARGET clue) like “towards”, “for”. Technical terms that appear immediately after the TARGET clue in the title of a seed paper are extracted as target words. The system finds papers that have the same target terms in the title. This viewpoint can effectively find papers focusing on the same target.

Same Method Paper The same method viewpoint is to search papers based on the same method or approach. Following the same proce-dure as the analysis of TARGET clue, 37 phrases as METHOD clue are prepared. We extract method terms

Figure2: Summary of recommendation result view

41

from the titles by using METHOD clue like “is based on”, “using” and “by”. Titles have valuable information for re-search topics, therefore, an accurate search can be achieved by the analysis of title structures focusing on the users in-terests.

Basic and Essential Paper Users with insufficient knowledge of a given field not only cannot understand the meaning of the technical terms, but they cannot identify which terms are essential and/or basic for the field. Novices need to know the priority and impor-tance of the technical terms in the target field. The com-pound words in the title and abstract are classified into sev-eral levels of vocabulary such as general terms, essential and basic terms, and difficult terms. This viewpoint can present users introductory papers that include several essen-tial terms on a priority base.

Authority Paper This viewpoint recommends the authority papers which are cited by similar papers. Papers cited in several papers are defined as important and necessary in a field. The authority papers show the preferences of other users and influential papers.

EVALUATION It is difficult to evaluate the recommendation system objec-tively. We conducted questionnaire for novices as a prelim-inary evaluation. 16 graduate students of Keio University as subjects used our system by inputting their interesting key-words. They selected one paper out of the output results and evaluated recommended papers based on each viewpoint. Two questions were asked to each participant: (1) Chose two preferred viewpoints and describe the reasons, (2) Eva-luate the recommendation result with a 5-point scale.

Table1: Ranking of favorite viewpoints

R Viewpoints C Reason: the viewpoints in good to …

1 State-of-the-art 8 grasp hot topics and streams in a target field

2 International 7 obtain international papers simply by Japanese keywords

3 serendipitous 6 identify the variety of the research topics through all fields.

4 essential 5 find papers for novices without any knowledge of the field

5 similar 4 guarantee the output of the most related papers

6 authority 2 know the basic and influential papers in a target field

R: Ranking, C: Count

The students’ favorite viewpoints and the main reason of their choice are depicted in Table 1. The most impressive point is that the international viewpoint was highly appre-ciated. Japanese graduate students always find it difficult to

search international papers because of their low level of English comprehension. The viewpoint helps to overcome the language barrier. The target and method viewpoints were not among their preferences. This can be explained by the fact that novices could not focus on a specific target or method yet.

As for the recommendation result, the average score was 3.6 point. Our system was positively evaluated.

CONCLUSION In this paper, we introduced OSUSUME: a research paper recommender system in Japanese. Our system employs the recommendation method in terms of eight viewpoints for types of users and their search motivations. In the question-naire, our system was highly evaluated by graduate students especially in the state-of-the-art, international and serendi-pitous viewpoints. We will focus on increasing the preci-sion of the recommendation methods and introduce a new methods based on natural language processing techniques. In addition, in order to solve the cold start problem com-pletely, we will expand our system with multiple seed pa-pers selection.

REFERENCES [1] T. Bogers and A. van den Bosch. Recommending scien-

tific articles using citeulike. In Proceedings of the 2008 ACM conference on Recommender systems, pages 287-290, Lausanne, Switzerland, 2008.

[2] S. M. McNee, I. Albert, D. Cosley, P. Gopalkrusgnan, S. K. Lam, A. M. Rashid, J. A. Konstan, and J. Riedl. On the Recommending of Citations for Research Papers. In Proceedings of the 2002 ACM conference on Com-puter supported cooperative work. pages 116-125, 2002.

[3] K. Sugiyama and M. Y. Kan. Scholarly Paper Recom-mendation via User’s Recent Research Interests. In Pro-ceedings of the 10th annual joint conference on Digital Libraries, pages 29-38, Gold Coast, Australia, 2010.

[4] R. Torres, S. M. McNee, M. Abel, J. A. Konstan and J. Riedl. Enhancing Digital Libraries with TechLens+. In Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries, pafes 228-236, Tuscon, AZ, USA, 2004.

[5] M. D. Ekstrand, P. Kannnan, J. A. Stemper, J. T. butler, J. A. Konstan and J. T. Riedl. Automatically Building Research Reading Lists. In Proceedings of the fourth ACM conference on Recommender systems, pages 159-166, Barcelona, Spain, 2010.

[6] T. Kondo, H. Nanba, T. Takezawa and M. Okumura. Trend Analysis by Analyzing Research Papers’ Titles, In Proceedings of the 4th Language & Technology con-ference, pages 234-238, Poznan, Poland, 2009.

42

[acm press the 2011 workshop - palo alto, california (2011.02.13-2011.02.13)] proceedings of the...

Documents