mi imapping science - scientometrics-school.eu science.pdf · yilll thbi fbib lithittil!you will...

53
M i i Mapping science using Bibexcel and Pajek By Olle Persson

Upload: lamhanh

Post on 23-Feb-2018

215 views

Category:

Documents


1 download

TRANSCRIPT

M i iMapping scienceusing Bibexcel and Pajek

By Olle Persson

Relationse at o s

• Units of analysis- document level- aggregated level: authors, universities, countries, journals …

• Citation based relationes- direct citations

shared references- shared references- co-citations

• Co-occurrences- co-authorshipsp- co-word

2

Citatbased relations between dokumentsCitatbased relations between dokuments

C D

AB

A cites C = direct citationA cites C = direct citation

A and C both cites B = bibliografic coupling

A and C afre borth cited by D = co-citationA and C afre borth cited by D co citation

3

Similarity measuresS a ty easu es

• Frequencies (raw counts)n of direct of citations- n of direct of citations

- n of co-occurences- n of shared references

• Normalized measures- Salton’s index- Jaccard’s index- Pearsons correlation

4

Mapping science app g sc e ce

1. Preparing data2 Calculating measures2. Calculating measures3. Making maps

Good if you have some experience with Pajek.Y ill l th b i f Bib l i thi t t i l!You will learn the basics of Bibexcel in this tutorial!

5

You will need this materialou eed t s ate a

1. A set of datahttp://www8.umu.se/inforsk/esss/cocit569.tx2p

2. Bibexcel sofwarehttp://www8 umu se/inforsk/Bibexcel/bibexcel exehttp://www8.umu.se/inforsk/Bibexcel/bibexcel.exe

3. Pajekhttp://vlado.fmf.uni-lj.si/pub/networks/pajek/

4. Reading material1st chapter in: http://www8 umu se/inforsk/Bibexcel/ollepersson60 pdfhttp://www8.umu.se/inforsk/Bibexcel/ollepersson60.pdf

6

Preparing dataPreparing data

7

Topic=(co-citation* OR cocitation*)Topic (co citation OR cocitation ) Databases=SCI-EXPANDED, SSCI, A&HCI Timespan=All Years. Update 2011-03-04

1. Convert to Dialog formatCo e t to a og o at

1. We have already searched and downloaded 569 records from Web of Science on co-citation analysis and

2. We have already replaced line feeds with carriage return in the downloaded file using Bibexcel: Edit doc-file/Replace line feed with carriage return

3 The file to be used is cocit569 tx23. The file to be used is cocit569.tx24. Put Bibexcel.exe in c:\Bibexcel and coccit569.tx2 in

c:\Bibexcel\Data5 Start bibexcel exe and next we will have to convert to Dialog5. Start bibexcel.exe, and next we will have to convert to Dialog

format that Bibexcel is designed for

8

You can open Bibexcel and make all steps in this presentation!p p

Select the cocit569.tx2 file and run Misc/Convert to Dialog format/Convert from Web of Science

9

Select cocit569 doc and press View fileSelect cocit569.doc and press View file

| = End of field; = Separates units | |= End of recordTwo letter field tag10

2 E t ti d t f CD fi ld ( it d d t )2. Extracting data from CD- field (cited documents)

Put tag here Units are separated by semicolon Let’s start!11

cocit569.out has the cited documents

This is the reference list of doc nr 112

3. Refining the out-fileTo improve data quality the Edit out-files menu has several options. For example, you may wish to reduce variation by only allowing the 1st initial in author names. Select cocit569.out and run Edit out-files/Keep only author’s first initialp y

13

Look at cocit569.1st and you can see that EOM SB is changed to EOM S

14

Let’s improve a little bit more: Select cocit569 1st and runLet s improve a little bit more: Select cocit569.1st and run Edit outfiles/Convert Upper lower Case/Good for Cited reference strings

15

Look at cocit569 low I think this looks much nicer compared to the out-file!Look at cocit569.low. I think this looks much nicer compared to the out-file!

16

Calculating dataCalculating data

17

1. Looking at frequenciesSelect cocit569.low.

Tick here Choose Whole string Press Start!

18

Look at cocit569.cit which has the cited references in decreasing frequency!For anyone familiar with co-citation research, the top 3 papers shouldn’t come as a surprisesurprise.

19

2 Making co-citations2. Making co-citationsSelect the cocit569.cit-file, press View file. In The list, mark cited references down to frequency=30 and then press Copy, then Clear and then Paste. These are the references for which you want co-citations

20

Select the cocit569 low-file and run Analyze/Co-occurrence/Make pairs via listbox and answer No toSelect the cocit569.low-file, and run Analyze/Co-occurrence/Make pairs via listbox, and answer No to the next question, and OK for the question after that!

21

The cocit569 coc had the co-citation frequencies We will use that file for mapping!The cocit569.coc had the co-citation frequencies. We will use that file for mapping!

22

Select cocit569 coc and run Mapping/Create net-file for Pajek be sure to answer No to the question ifSelect cocit569.coc and run Mapping/Create net-file for Pajek … be sure to answer No to the question if directed arcs, since we do not have any directions here.

23

The cocit569 net file can be opened from within Pajek Netdraw Mapquation etc for drawing mapsThe cocit569.net file can be opened from within Pajek, Netdraw, Mapquation etc for drawing maps.

24

Mapping with PajekMapping with Pajek

25

Open cocit569 net file in Pajek and then Draw/DrawOpen cocit569.net file in Pajek, and then Draw/Draw

26

This is the first layout with randomly ordered nodes.To the upper left chooseTo the upper left, chooseLayout/Energy/Kamada-Kawai/Separate components or just press Ctrl-K

27

The Kamada-Kawai layout is better but still there is perhps too many lines in the graph, since almost everyone is connected to all otherssince almost everyone is connected to all others

28

To reduce complexity minimize the draw window and then run Net/Transform/Remove/Lines with V l /l th / d t 10 i thValue/lower than/ and put 10 in the box and answer yes to Make new network.

After that run Draw/Draw again!

29

This map ha more structure. We find that papers to the left and newer ones to the right. You can press Ctrl-K several times to see what happensYou can press Ctrl K several times to see what happens

30

Making vectorsgMaking circles on nodes based on citation frequencies. Go to Bibexcel and select cocit569.cit and the run Mapping/Create vec-file. Below you can see that cocit569.vec is created

31

Go back to Pajek. Open the Vector file cocit569.vec

and then run Draw/Draw-Vector

32

Now you can see that circles correspond to n of citationsNow you can see that circles correspond to n of citations

33

Making partitionsg pIf you wish you can create a clu-file using Bibexcel that indicates the publication year, or decade

of the cited documents.

1. Select cocit569.cit and run Edit out-file/Extract publication year from references

2. and you will get a file named cocit569.dpy.

3. Select cocit569.dpy and run Mapping/Create clu-file

4. and you will get a file named cocit569.clu

5. Go to Pajek and open cocit569.clu as partiotion

6. Run Draw/Draw-Partition-Vector and then in the draw window Layers/In y direction

34

Makes sense?Makes sense?

35

Using Options/Lines/Different Widths and GreyScale and Options/Size/Of lines = 0 25

This could be a chronological reading list for reviewers and studentsOptions/Size/Of lines = 0.25 for reviewers and students

36

Bibexcel makes so many files….1. cocit569.tx2: text-file where LF was replaced by CR2 it569 d t d t Di l f t2. cocit569.doc: converted to Dialog-format3. cocit569.out : out-file based on CD-field4. cocit569.1st : keep only author’s first initial5 cocit569 low: convert to upper and lower case5. cocit569.low: convert to upper and lower case6. cocit569.cit: frequencies7. cocit569.coc: co-occurrences8. cocit569.net: net-file to be open in Pajek8. cocit569.net: net file to be open in Pajek9. cocit569.vec: vec-file to be open as Vectors in Pajek10. cocit569.clu: clu-file to be open as Partitions in Pajek11. cocit569.vel: vertices for net-file for use by Bibexcely

…. but better to have them than not!

37

All author co-citation analysis using Scopus records“Its always better not to limit to 1st cited author as in WoS”

1. Get scopuscocit.ris from http://www8.umu.se/inforsk/esss/scopuscocit.ris2. Select scopuscocit.ris and run Edit doc-file/Replace line feed with carriage return3. Select scopuscocit.tx2 and run Misc/Convert to Dialog format/Convert from Scopus RIS

formatformat4. Select scopuscocit.doc, put CD in Old tag, choose “Any ; separated field” and press Prep5. Select scopuscocit.out and run Edit out-file/Scopus tools/Extract all authors from Scopus

references6. Select scopuscocit.sco and run Edit out-file/Decompress outfile7 Select scop scocit nn choose Whole string mark Remo e d plicates and Make ne7. Select scopuscocit.nnu, choose Whole string, mark Remove duplicates and Make new

out-file, and then press Start8. Select scopuscocit.oux, mark Sort decending and press Start9. Select scopuscocit.cit and press View file and select units down to frequencies=30, and be

sure only these are in The List10 Select scopuscocit oux and run Analyze/Co occurrences/Make pairs via list box10. Select scopuscocit.oux and run Analyze/Co-occurrences/Make pairs via list box11. Select the scopuscocit.coc file and then run Mapping/Create net-file for Pajek…12. Select scopuscocit.cit and run Mapping/Create vec-file13. Go to Pajek and open scopuscocit.net as Network and scopuscocit.vec as Vectors14. Run Draw/Draw-Vector…

38

Draw-vectorDraw vector

39

To reduce complexity minimize the draw window and then run Net/Transform/Remove/Lines with V l /l h / d 10 i h b dValue/lower than/ and put 10 in the box and answer yes to Make new network.

After that run Draw/Draw-vectorand then ctrl-K

Griffith BC would probably not show up in 1 t th1st author analysis

Webometricsmetrics

Go back and fix this variant!

40

For vector graphic quality. At the Draw window runExport/2D/SVG/General and save as allauthormap.htm

Get Inkscape free from http://inkscape.org/download/and open allauthormap.htm, edit and export to png-format

41

Analyzing direct citations on Web of Science records1. Select cocit569.low and run Analyze/Citations among docs/Make citation

links. This will make cocit569.lin that has citing docnr in first column and cited gdocnr in second column.

2. Of course you need to label the doc numbers. Select the cocit569.ddc and double click in the box at “Type new file name here” and the path to cocit569.ddc should appear.

3 Select cocit569 lin and run Add data classify/Add labels to docnr docnr pairs3. Select cocit569.lin and run Add data classify/Add labels to docnr-docnr pairs. Answer No to questions about swapping, self-related pairs, overlapping sets, and about writing doc numbers in addition to labels

4. Select cocit569.add and then run Mapping/Create net-file for Pajek and answer Yes for directed graphs!

5. Open cocit569.net in Pajek and Draw/Draw6. You will need to reduce complexity: Run

Net/Transform/Reduction/Degree/Input and set value=15. Then Draw!7. If you would like to have different circle sizes: Minimize Draw window and then

run Net/Vector/Summing up values of lines/Input a Vector is created that hasrun Net/Vector/Summing up values of lines/Input a Vector is created that has the number of inlinks to each node. Then Draw/Draw-vector…

42

43

Analyzing using Weighted Direct Citations (WDC)We can add number of shared outlinks and inlinks to each direct citation toWe can add number of shared outlinks and inlinks to each direct citation, to give each direct citation different strength

1. Select cocit569.lin and run Analyze/Citations among docs/ Weighted Direct Citations (WDC). The cocit569.wdc has the WDC values for each docnr-docnr pairpair

2. Again you need to label the doc numbers. Select the cocit569.ddc and double click in the box at “Type new file name here” and the path to cocit569.ddc should appear.

3. Select cocit569.wdc and run Add data classify/Add labels to freq-docnr-y qdocnr/making freq-label-label. Answer No to questions about swapping, self-related pairs, and overlapping sets.

4. Select the cocit569.cdd file and run Edit out-file/Sort numeric/Descending by first column and you will see which are the strongest links by the WDC measure

5 Select cocit569 cdd and run Mapping/Create net file for Pajek and answer Yes5. Select cocit569.cdd and run Mapping/Create net-file for Pajek, and answer Yesfor directed arcs!

6. In Pajek use Net/Transform/Remove/Lines with Values/Lower than=10!7. Then Draw/Draw and you will see one big network component and several

smaller ones and quite many isolates. You can zoom in to the bigger one by q y gg ypressing right mourse button and draw.

8. If you go back to Pajek main window and run Net/Components/Weak and type size=20 you will get 1 component and then with Operations/Extract from network/Partition=1 you will get a new network with the big component. Then Draw that network!Draw that network!

44

45

…further improvement by saving major component and adding new partitions and vectors

1. Be sure to mark the main component (with 63 nodes)2. Then File/Network/Save and then overwrite cocit569.net3 In Bibexcel select the cocit569 net and run Mapping/Create vel file3. In Bibexcel select the cocit569.net and run Mapping/Create vel-file

from net-file4. Select the cocit569.ddc file and run and run Edit out-file/Extract

publication year from references5 Select cocit569 dpy and run Mapping/Create clu file5. Select cocit569.dpy and run Mapping/Create clu-file6. Open cocit569.clu as Partition in Pajek and then Draw/Draw-partition

and then Layers/In y direction7. If you would like to have different circle sizes: Minimize Draw window

d th N t/V t /S i l f li /I t V tand then run Net/Vector/Summing up values of lines/Input a Vector is created that has the sum of WDC values of inlinks to each node. Then Draw/Draw-Partition-Vector…

46

47

…reduce direct citations by citation year lageduce d ect c tat o s by c tat o yea ag1. Select cocit569.cdd and run Analyze/Calculate year lags in pairs and answer Yes to add

year lag values, which will come in column 1. Column 2 has a normalization (col.3 divided by col.3,) and col. 3 has the WDC value, col. 4 citing doc and col.5 cited doc.

2 Select cocit569 lag and to get year lags 0-2 years put 2 in Max number Box and then run Edit2. Select cocit569.lag and to get year lags 0 2 years put 2 in Max number Box and then run Edit out-files/Delete values high frequencies

3. Select cocit569.max, put 3/4/5 in The Box and run Edit out-file/Select columns4. Now cocit569.col has WDC values only for links no older than 2 years!5. Select cocit569.col and run Mapping/Create net-file for Pajek6 Go to Pajek and open the net-file and the vec-file! Removed lines with values less than 5 then6. Go to Pajek and open the net-file and the vec-file! Removed lines with values less than 5, then

Net/Componenets/Weak (min 20), then extract and save the major component to file cocit569.net

7. In Bibexcel, select cocit569.cdd, put 1/3 in The Box and run Edit out-files/Select columns, and then select cosit569.col and make frequencies with whole string, then cocit569.cit will have number of times a paper is cited.

8. In Bibexcel select cocit569.net and run Mapping/Create vel-file from net-file and then select the cocit 569.cit and run Mapping/Create vec-file

9. Back to Pajek and open the vec-file, and then Draw/Draw-vector

48

49

…also, you can reduce co-citations by citation year lag

1. Select cocit569.coc and run Analyze/Calculate year lags in pairsand answer Yes to add year lag values

2 Select cocit569 lag and to get year lags 0 5 years put 5 in Max number2. Select cocit569.lag and to get year lags 0-5 years put 5 in Max number Box and then run Edit out-files/Delete values high frequencies

3. Select cocit569.max, put 1/4/5 in The Box and run Edit out-file/Select columns

4. Now cocit569.col has co-citations values only for pairs no older than 5 years!

5. Select cocit569.col and run Mapping/Create net-file for Pajek6 Al l t it569 it d M i /C t fil6. Also select cocit569.cit and run Mapping/Create vec-file7. Go to Pajek and open the net-file and the vec-file!

50

51

The same graph as previous, but now g p p ,ordered in year layers and edited using Inkscape

52

The End

53