identifying hot brazilian science and technology: tech mining methods for relating sources of...

11
IDENTIFYING HOT BRAZILIAN SCIENCE AND TECHNOLOGY: TECH MINING METHODS FOR RELATING SOURCES OF KNOWLEDGE AND EMERGING RESEARCH AREAS EU-SPRI CONFERENCE, 12-13 June 2012 Hannes Toivanen VTT Technical Research Centre of Finland Email. [email protected]

Upload: morris-wells

Post on 17-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IDENTIFYING HOT BRAZILIAN SCIENCE AND TECHNOLOGY: TECH MINING METHODS FOR RELATING SOURCES OF KNOWLEDGE AND EMERGING RESEARCH AREAS EU-SPRI CONFERENCE,

IDENTIFYING HOT BRAZILIAN SCIENCE AND TECHNOLOGY: TECH MINING METHODS FOR RELATING SOURCES OF KNOWLEDGE AND EMERGING RESEARCH AREAS

EU-SPRI CONFERENCE, 12-13 June 2012Hannes ToivanenVTT Technical Research Centre of FinlandEmail. [email protected]

Page 2: IDENTIFYING HOT BRAZILIAN SCIENCE AND TECHNOLOGY: TECH MINING METHODS FOR RELATING SOURCES OF KNOWLEDGE AND EMERGING RESEARCH AREAS EU-SPRI CONFERENCE,

2

Knowledge dynamics and domestic capabilities

What role do domestic capabilities play for countries trying to move towards knowledge economy?

National systems of innovation framework Are countries focussing on strategic areas of science and technology in

research?

Can we distinguish between or measure the geographic location of knowledge creation?

The objective is to clarify methods to identify with enhanced accuracy emerging trends within fields and countries

(1) To identify “hot” research fields within Brazilian research; (2) Assess to what degree different “hot fields” rely on

Brazilian knowledge bases vs. foreign ones.

Page 3: IDENTIFYING HOT BRAZILIAN SCIENCE AND TECHNOLOGY: TECH MINING METHODS FOR RELATING SOURCES OF KNOWLEDGE AND EMERGING RESEARCH AREAS EU-SPRI CONFERENCE,

3

Data

XML data from Thomsom Reuters: Institute for Scientific Information (ISI) papers with at least one Brazilian research address from years 2005-2009, forming SOURCE data set;

Total number of papers: 152,031. For this paper, only articles and proceedings were incluced, totalling 127,826.

In addition: CITING: All papes making citations to SOURCE papers (283,131 records); Unique identifier linking CITED, SOURCE and CITING papers; Note: not ISI indexed citation references not included in data; Data delivery in May 2010 – Cut-off date for accumulating citations

Linking CITING papers, we included whole data that was classified as ”hot”.

To classify research fields, we use ISI Subject categories that are grouped in OECD Minor Fields Reliability issues with the ISI Subject Categories (Leydesdorff & Rafols 2008; Boyack et

al 2005);

Data processed with Vantage Point software

Page 4: IDENTIFYING HOT BRAZILIAN SCIENCE AND TECHNOLOGY: TECH MINING METHODS FOR RELATING SOURCES OF KNOWLEDGE AND EMERGING RESEARCH AREAS EU-SPRI CONFERENCE,

4

Measuring “domestic” and “foreign”contributions Typically papers are counted as a ”whole”, despite of how many people

or institutions are listed as authors (Total records) ->what is the amount of noise in authorships?

We distuingish between Brazilian and Foreign institutional authorship (Research country)

Fractional domestic count (FDC) vs Fractional other count Relative shares of instances of research addresses = institutional share of

authorship

Institutional authorship: Each author gives at least one research address; Each completely identical research address is indexed as one (e.g. different

department or street of same organization is indexed as 2 separate addresses

Issues in estimating institutional authorship: Multiple authors from one address are counted as only one; One author with multiple addresses is counted multiple times; Is a proxy – reliability and accuracy subject to validation

Page 5: IDENTIFYING HOT BRAZILIAN SCIENCE AND TECHNOLOGY: TECH MINING METHODS FOR RELATING SOURCES OF KNOWLEDGE AND EMERGING RESEARCH AREAS EU-SPRI CONFERENCE,

5

Defining “hot papers” and “hot fields”

”Hot papers” have quickly impact on research – number of citations Can be self-citation or ”genuine” citation Citation as a ”relationship” between papers Citations received within narrow time-window

Times cited / Share of citations (to select ”hot papers”) The share of citations received +/- 1 year from publication date from total

citations accumulated by total national output E.g. For Brazilian papers for 2005, we identify all papers that have received citations by

papers (only ISI indexed) published in 2004, 2005, and 2006.

The most 10% cited papers from all articles with at least one author coming from Brazil

”Hot fields” (to classify ”hot papers”) Because we include journal articles and conference prodeecings, we use

the ISI Subject Categories, which total over 249 different fields (2012) (alternative would be journal fields)

These are consolidated into 39 different OECD Minor Fields

Page 6: IDENTIFYING HOT BRAZILIAN SCIENCE AND TECHNOLOGY: TECH MINING METHODS FOR RELATING SOURCES OF KNOWLEDGE AND EMERGING RESEARCH AREAS EU-SPRI CONFERENCE,

6

Defining Brazilian ”Hot papers” 2005, 2007, 2009

The 10% from all annual articles and conference proceedings receiving most citations +/-1 year from publication date

Account about 40% all citations received in this period Less than 5% of all papers

Page 7: IDENTIFYING HOT BRAZILIAN SCIENCE AND TECHNOLOGY: TECH MINING METHODS FOR RELATING SOURCES OF KNOWLEDGE AND EMERGING RESEARCH AREAS EU-SPRI CONFERENCE,

7

Top-20 Brazilian OECD Minor fields 05-07-09 (total records)

Rank by 2005 totals

Page 8: IDENTIFYING HOT BRAZILIAN SCIENCE AND TECHNOLOGY: TECH MINING METHODS FOR RELATING SOURCES OF KNOWLEDGE AND EMERGING RESEARCH AREAS EU-SPRI CONFERENCE,

8Top-20 Brazilian ”Hot Paper” fields 05-09 – Total Records

Rank by 2005 total record rankings for ”hot paper” total records from here onwards (number of total records 2009)

”Hot fields” are more concentrated than overall research

Page 9: IDENTIFYING HOT BRAZILIAN SCIENCE AND TECHNOLOGY: TECH MINING METHODS FOR RELATING SOURCES OF KNOWLEDGE AND EMERGING RESEARCH AREAS EU-SPRI CONFERENCE,

9

Average share of ”hot papers” 05-07-09. Total records and fractional count shares.

Total records over-estimates ”hotness”

”Hotness” revealed by fractional domestic count

Page 10: IDENTIFYING HOT BRAZILIAN SCIENCE AND TECHNOLOGY: TECH MINING METHODS FOR RELATING SOURCES OF KNOWLEDGE AND EMERGING RESEARCH AREAS EU-SPRI CONFERENCE,

10

Average share of ”total record” and ”BR FDC” ”hot papers” 05-07-09 from all citations received by ”hot papers”

Total records over-estimates ”hotness”

”Hotness” revealed by fractional domestic count

Page 11: IDENTIFYING HOT BRAZILIAN SCIENCE AND TECHNOLOGY: TECH MINING METHODS FOR RELATING SOURCES OF KNOWLEDGE AND EMERGING RESEARCH AREAS EU-SPRI CONFERENCE,

11

Results and conclusions

Fractional domestic count reveals a lot of ”noise” total in record count of national ”hotness”

With FDC volume and rank of ”hotness” of papers and fields changes substantially

Comparison of total records vs Fractional domestic count ”hotness” Physical sciences, astronomy declines from 1st to 5th (papers) Chemical sciences advances from 5th to 1st (papers) and from 5th to 2nd

(citations) Emerging Brazilian ”hot fields” become visible (papers and citations):

Materials engineering, Environmental engineering, Other agricultural science Mathematics, Other engineering and technologies

Total record count is fine when ”hotness” is measured in global science Regional strategic (systems of innovation) ”hotness” requires assessment

of ”localness” in total volume and citations