advisor : dr. hsu reporter : chun kai chen

23
1 Intelligent Database Systems Lab 國國國國國國國國 National Yunlin University of Science and T echnology Domain analysis and information retrieval through the construct ion of heliocentric maps based on ISI-JCR category cocitation Advisor Dr. Hsu Reporter Chun Kai Chen Author Felix de Moya-Anego’n and Benj amin Vargas- Quesada Information Processing and Management 41 (2005) 1520–1533

Upload: spiro

Post on 15-Jan-2016

44 views

Category:

Documents


0 download

DESCRIPTION

Domain analysis and information retrieval through the construction of heliocentric maps based on ISI-JCR category cocitation. Advisor : Dr. Hsu Reporter : Chun Kai Chen Author : Felix de Moya-Anego’n and Benjamin Vargas- Quesada. Information Processing and Management 41 (2005) 1520–1533. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Advisor   : Dr. Hsu Reporter : Chun Kai Chen

1Intelligent Database Systems Lab

國立雲林科技大學National Yunlin University of Science and Technology

Domain analysis and information retrieval through the construction of heliocentric maps based on ISI-JCR category cocitation

Advisor : Dr. Hsu

Reporter: Chun Kai Chen

Author: Felix de Moya-Anego’n and Benjamin Vargas- Quesada

Information Processing and Management 41 (2005) 1520–1533

Page 2: Advisor   : Dr. Hsu Reporter : Chun Kai Chen

2

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Outline

Motivation Objective Introduction Methodology Experimental Conclusions Personal Opinion

Page 3: Advisor   : Dr. Hsu Reporter : Chun Kai Chen

3

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Motivation

Scientific information is spread out over disciplines which, to the outside observer, may seem to have little in common

The representation of scientific information in ways easier for the human mind to embrace is nothing new─ make visible to the mind that which is not visible to the eye, or to

create a mental image of something that is not obvious (e.g. an abstraction), are two definitions of the word

─ visualization that point to the intrinsic need to represent information in a non-traditional manner

Page 4: Advisor   : Dr. Hsu Reporter : Chun Kai Chen

4

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Objective

The objective of this paper is ─ present a methodology for the visual representation and ─ analysis of major scientific domains─ these representations, moreover, can be used as

interfaces for information retrieval

Page 5: Advisor   : Dr. Hsu Reporter : Chun Kai Chen

5

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Introduction

Moya Anego’n et al. (2004)─ reviewed the relevant literature of the past four decades in inform

ation visualization─ proposed the use of class and subject category cocitation as a tech

nique for the analysis and visualization of great domains─ the present paper puts forth the construction of heliocentric maps

make manifest the relationships among categories and the flux of information within and among them these maps yield the possibility of showing the documents hid

den behind each category and the links that unite them

Page 6: Advisor   : Dr. Hsu Reporter : Chun Kai Chen

6

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Category cocitation

Cocitation is a widely used and generally accepted technique for obtaining relational information about documents belonging to a domain─ This relational information can be used to build maps

will represent, with a high degree of fidelity, the structure of the domain that

the documents comprise

Page 7: Advisor   : Dr. Hsu Reporter : Chun Kai Chen

7

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Source of data

Downloaded from the Web of Science─ the Science Citation Index-Expanded (SCI-E), Social

Science Citation Index (SSCI), and the Arts & Humanities Citation Index (A & HCI) the year 2002 whose Address field included ‘‘Spain’’ or

‘‘France’’ or ‘‘England’’ the database contained a total of 159,794 documents (articles,

biographical items, book reviews, corrections, editorial materials, letters, meeting abstracts, news items and reviews) from 6584 journals

Page 8: Advisor   : Dr. Hsu Reporter : Chun Kai Chen

8

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.4.Methodology_latent cocitation

The adoption of the ISI-JCR classification as the unit of measurement and cocitation implies─ latent cocitation

may assign different categories to one single journal Information Processing & Management (IPM) belongs to the cate

gories Information Science & Library Science, and also to Computer Science-Information Systems

thus producing an error of accumulation in computing cocitation

Page 9: Advisor   : Dr. Hsu Reporter : Chun Kai Chen

9

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.4.Methodology_non-latent category cocitation

Eliminating this cocitation latency─ group the categories cited by each one of the source documents, a

nd calculate cocitation on the basis of that grouping─ this non-latent form of cocitation is the one we will use to generat

e heliocentric map

Multidiciplinary Sciences─ such as Genetics is published in one of these journals, it is not ref

lected in the map of its domain, but rather is labeled as ‘‘multidisciplinary’’

─ replace the category Multidisciplinary Sciences with the category that is most cited

Page 10: Advisor   : Dr. Hsu Reporter : Chun Kai Chen

10

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.4.Methodology_normalization

Another obstacle to overcome is ─ the normalization of the citation indexes throughout the field of d

isciplines included in the SCI, SSCI and A & HCI─ already been dealt with by Small and Garfield (1985)─ Normalized Cocitation Measurement

Cc is the cocitation C is the citation

85.09684

75

)()(

)()(

jcic

ijCcijNCM

Page 11: Advisor   : Dr. Hsu Reporter : Chun Kai Chen

11

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.4.1.Rendering the information(1/3)

In generating these graphs─ we used the algorithm of Kamada and Kawai (1989)─ automatically generates non-directed graphs on a plane─ guided by esthetic criteria:

it minimizes the number of crossed links, reflects the symmetries of the graph, distributes the nodes in a uniform manner over the available s

pace makes all the links homogeneous with regards to length

Page 12: Advisor   : Dr. Hsu Reporter : Chun Kai Chen

12

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.4.1.Rendering the information(2/3)

Unlike Kamada Kawai on this point, we preferred to interpret the cocitation values of the planets with respect to the central category as similarities

─ emphasize the distance among planets

─ a maximum value for cocitation is established as 1

─ the rest of the values are made proportional with reference to this maximum

Page 13: Advisor   : Dr. Hsu Reporter : Chun Kai Chen

13

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.4.1.Rendering the information(3/3) The resulting map is exported to Scalable Vector

Graphic (SVG) format─ allows us to zoom in or move vertically or horizontally over the

maps─ In turn, the code is subjected to a series of modifications

First, the nodes of each map are tagged with the names corresponding to each one of the ISI-JCR categories.

Then, for each map, the size of these categories is made proportional to the number of documents produced in them. In this way categories with only minor scientific production are made perfectly visible.

Third, the hyperlinks needed in the links and in the central category are inserted to allow the retrieval of information associated with them.

Page 14: Advisor   : Dr. Hsu Reporter : Chun Kai Chen

14

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.4.2. Information retrieval

Each heliocentric map includes─ in the helios and in the links with its planets─ hyperlinks that make it possible for us to click into a rel

ational database There are two means of retrieving and accessin

g this information─ first is tied to the heliocentric category itself─ second would be an ordering of the documents in view

of the orbits existing between the heliocentric category and its planets by relevance of cocitation

Page 15: Advisor   : Dr. Hsu Reporter : Chun Kai Chen

15

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.5. Results

To facilitate the understanding of results for the reader─ first place we give a general analysis of the Spanish

domain, using as an example several heliocentric maps of that domain

─ compare the domains of Spain, France and England, also on a general level, by looking at some of the more characteristic or unusual heliocentric maps produced

Page 16: Advisor   : Dr. Hsu Reporter : Chun Kai Chen

16

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.5.1. Analysis of a domain

Fig. 3. with a threshold value equal to the mean. Fig. 4. with no cutoff point.

Page 17: Advisor   : Dr. Hsu Reporter : Chun Kai Chen

17

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Fig. 2. Heliocentric map of Information Science & Library Science in Spain.

Fig. 5. Spanish documents under the category Library Science and Information Science

Page 18: Advisor   : Dr. Hsu Reporter : Chun Kai Chen

18

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Fig. 6. Documents associated with the link between Library Science and Information Science and Computer Science & Information Systems

Fig. 7. Heliocentric map of Computer Science-Information Systems in Spain.

Page 19: Advisor   : Dr. Hsu Reporter : Chun Kai Chen

19

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.5.2. Comparison of domains

Fig. 8. Heliocentric maps of Astronomy & Astrophysics.

Fig. 9. Heliocentric maps of Physics-Particles & Fields

Page 20: Advisor   : Dr. Hsu Reporter : Chun Kai Chen

20

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Fig. 12. Heliocentric maps of Tropical Medicine

Fig. 11. Heliocentric maps of Sport Sciences.

Fig. 10. Heliocentric maps of Psychology

Page 21: Advisor   : Dr. Hsu Reporter : Chun Kai Chen

21

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Fig. 13. Heliocentric maps of Law.

Page 22: Advisor   : Dr. Hsu Reporter : Chun Kai Chen

22

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Conclusion We well aware of the fact that our reliance on the ISI-JCR clas

sification as an element of cocitation entails some bias and limitations

It is reasonable─ propose this methodology as perfectly valid for the representation─ analysis of large domains of knowledge or information from a social po

int of view─ the renderings be used as interfaces for information retrieval ─ the cutoff values used in the construction of the maps may be adjusted d

epending on the users objective Furthermore

─ the research efforts reflected in our maps are not distributed uniformly over disciplines or over countries

─ the time period we analyze here is too short to show the evolution of research in a country

Page 23: Advisor   : Dr. Hsu Reporter : Chun Kai Chen

23

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Personal Opinion

Advantage─ proposes a new technique for schematic visualization

applied to the analysis of large scientific domains

Disadvantage