metadata usage tendencies in latin american electronic journals

18
Metadata Usage Tendencies in Latin American Electronic Journals Rolando Coto-Solano Helena Francke Saray Córdoba-González ELPUB 2009 Milan, Italy

Upload: rolando-coto

Post on 13-Jun-2015

602 views

Category:

Technology


4 download

DESCRIPTION

Presented at ELPUB 2009, and written along with Helen Francke (University of Borås) and Saray Córdoba (University of Costa Rica)

TRANSCRIPT

Page 1: Metadata Usage Tendencies in Latin American Electronic Journals

Metadata Usage Tendencies inLatin American Electronic Journals

Rolando Coto-SolanoHelena FranckeSaray Córdoba-González

ELPUB 2009Milan, Italy

Page 2: Metadata Usage Tendencies in Latin American Electronic Journals

Justification (1/2)

- In order to fully exploit the advantages OA brings, journals need to be both visible and accessible.

- In practice, this means:- Being included in journal indexes- Being retrievable by data harvesters (such as OAI harvesters)- Being retrievable by search engines (such as the current vectorial-based search engines, and the future semantic web answer engines).

- To help with this process, the main information about an article, the metadata, can be presented according to certain standards. (From full DTD marking and use of Dublin Core, to at least marking the <description> of a web page).

Page 3: Metadata Usage Tendencies in Latin American Electronic Journals

Justification (2/2)

- The metadata (such as article title, keywords, abstract) represent the minimal desirable information that we would like the editors to encode so that their articles are easy to describe and more likely to be retrieved.

We focused on three questions:

- How often do these data appear in Latin American journals? How often do these journals really include information such as abstracts (or abstracts in more than one language, for example)?

- How often are the metadata encoded in a way that would help retrievability? (DTD, any form of XML, Dublin Core, HTML metadata)

- How often do the titles of the web pages include sufficient information about the content of the webpage?

Page 4: Metadata Usage Tendencies in Latin American Electronic Journals

1. Methodology

2. How many journals have metadata?

3. Preparedness for DTD/XML marking(How much of the data that could be marked does exist?)

4. What about multilingualism?

5. Most used metatags

6. Actual output formats: (X)HTML vrs PDF

7. The most visible tag: <title>

8. Conclusions

Contents

Page 5: Metadata Usage Tendencies in Latin American Electronic Journals

Methodology (1/2)

- We randomly chose a sample of 167 journals from the LATINDEX database, belonging to 12 different countries and territories (Argentina, Brazil, Chile, Colombia, Costa Rica, Cuba, Ecuador, Mexico, Peru, Puerto Rico, Uruguay and Venezuela).

- After eliminating the journals that didn’t have their own website (were available only within a journal portal) or weren’t full peer-reviewed journals (bulletins, for example), we were left with a sample of 123 journals.

Page 6: Metadata Usage Tendencies in Latin American Electronic Journals

Methodology- We examined four “levels”:

- Cover of the website (the main entry page)- Table of contents- Article presentation page (title and abstract of the article)- Article full text page

-In each of these, we examined the available metatags and the format of the page. We also examined the contents of the <title> tag.

- We examined the first article of the latest number of each article, and checked whether it included a title, an abstract, keywords, author information, and whether this was presented in more than one language.

Methodology (2/2)

Page 7: Metadata Usage Tendencies in Latin American Electronic Journals

How many journals have metadata (on any level)

Total Journals 123 (100%)

Have any metatags107 (87%)

Have anynon-automatic metatags

55 (45%)

Have DC metatags16 (13%)

Page 8: Metadata Usage Tendencies in Latin American Electronic Journals

Journals in Costa Rica and Argentina use DC tags significantly more frequently than the rest of the countries in the sample, 66% and 35% respectively (p < 0.05). Brazil also presents a high usage of DC tags in its journals: 17%.

This might be due to training provided to the editors by scientific institutions in the country (more on this in the Conclusions section).

Page 9: Metadata Usage Tendencies in Latin American Electronic Journals

Presence of basic descriptors in the articles(title, abstract, keywords, author affiliation)

Total articles123 (100%)

Has a title123 (100%)

Has author affilliation105 (85%)

Has an abstract103 (84%)

Has keywords95 (77%)

Has title marked as metadata17 (14%)

Has author affilliation marked as metadata5 (4%)

Has abstract marked as metadata9 (7%)

Has keywords marked as metadata8 (7%)

Page 10: Metadata Usage Tendencies in Latin American Electronic Journals

More engineering and medical sciences journals use keywords than do journals in other areas.

Medical sciences journals use significantly more abstracts than journals in other areas. (96% of the medical journals use abstracts).

Journals in Arts and Humanities use significantly less abstracts and keywords than journals in other areas (73% use abstracts and 64% use keywords).

(An interesting find is that, contrary to what could be expected, journals in the Exact and Natural Sciences and in Social Sciences are not significantly different in their use of abstracts and keywords. (For example, 70% of Exact and Natural Sciences journals use keywords; 71% of the Social Sciences journals use keywords). This will have to be verified in further studies).

Page 11: Metadata Usage Tendencies in Latin American Electronic Journals

Presence of basic descriptors (title, abstract, keywords, author affiliation)

Total articles123 (100%)

Has title in English54 (44%)

Has abstract in English88 (72%)

Has keywords in English87 (71%)13% of the articles were written in English

Page 12: Metadata Usage Tendencies in Latin American Electronic Journals

Most used metatags

Cover (n = 55) Table of contents (n = 42)

keywords (58%) keywords (60%)description (58%) description (57%)author (27%) robots (31%)robots (26%) author (26%)

Article presentation page (n = 20) Full text page in HTML (n = 16)

DC.Language (50%) keywords (50%)DC.Title (50%) description (50%)DC.Description (45%) author (25%)DC.Type (45%) robots (18%)

Article presentation pages are salient in their use of DC tags. This might be due to the fact that, if the journal uses article presentation pages at all (a practice that is not very common in Latin America), then the editor could have also become aware of other “good practices” in larger publishing cultures, such as use of DC.

Page 13: Metadata Usage Tendencies in Latin American Electronic Journals

Actual output formats: (X)HTML vrs PDF

About 7% are specified as XHTML. However, we only found one (Electronic Journal of Biotechnology) that offered access to an actual XML-marked copy.

(Systems such as SciELO and RedALyC do offer XML copies of their articles).

(X)HTML 33 (27%)

PDF105 (85%)

Both (X)HTML and PDF 17 (14%)

Page 14: Metadata Usage Tendencies in Latin American Electronic Journals

Contents of the tag <title>

Page 15: Metadata Usage Tendencies in Latin American Electronic Journals

Contents of the tag <title> on the Cover level

Journal title93 (76%)

Institution Name24 (20%)

Issue Information2 (2%)

Have a cover122 (100%)

None of the above10 (8%)

Two or more of the above9 (7%)

Page 16: Metadata Usage Tendencies in Latin American Electronic Journals

Contents of the tag <title> on the Full Text HTML page

Journal title14 (42%)

Institution Name12 (36%)

Issue Information5 (15%)

Have a full text page33 (100%)

Two or more of the above4 (12%)

Article title14 (42%)

Author name4 (12%)

None of the above1 (3%)

Both title and author’s name0 (0%)

Page 17: Metadata Usage Tendencies in Latin American Electronic Journals

- Use of non-national languages in the metadata was high (particularly of English), but not in the text of the article itself. Most of the articles (84%) were written in the national languages.

- (Speakers of Spanish particularly have traditionally been credited with being defensive about their language, and editors often reflect this attitude. Moreover, many institutions in the Spanish speaking world are promoting the use of Spanish as a language of science. This debate, however, has become ideologically charged, and it’s still ongoing).

- Multilingualism was very often not taken into account when marking metadata.

- Relatively few journals in the sample used Content Management Systems (29% use any system; 9% use OJS or an OJS derivate).

Conclusions (1/2)

Page 18: Metadata Usage Tendencies in Latin American Electronic Journals

- PDF-centric publishing distracts attention from keywords and from text marking in general

- Of course the best solution would be DTD/XML marking. This would

- Help editors think of key data (such as abstracts)

- Providing better data for existing vectorial-based search engines

- Help index the data for the use of future semantic web search engines [think Wolfram Alpha]

- Training of editors is key, as it helps implement relatively non-expensive standards (DC; titles according to good SEO practices), and could help sensitivize editors towards more complex standards (DTD/XML)

Conclusions (2/2)