base : a powerful search engine for open access documents
TRANSCRIPT
![Page 1: BASE : a powerful search engine for Open Access documents](https://reader031.vdocuments.mx/reader031/viewer/2022013011/55ca32e6bb61ebe2168b4625/html5/thumbnails/1.jpg)
Universitätsbibliothek
BASE – a powerful search engine for Open Access documents
AIMS@OA Week
25 Oct 2012
Friedrich SummannBielefeld University Library
![Page 2: BASE : a powerful search engine for Open Access documents](https://reader031.vdocuments.mx/reader031/viewer/2022013011/55ca32e6bb61ebe2168b4625/html5/thumbnails/2.jpg)
Universitätsbibliothek
Overview
BASE – the OA search engine
Harvesting OAI-PMH and its challenges
Metadata Aggregation and Data Quality
Processing Subject Repositories
![Page 3: BASE : a powerful search engine for Open Access documents](https://reader031.vdocuments.mx/reader031/viewer/2022013011/55ca32e6bb61ebe2168b4625/html5/thumbnails/3.jpg)
Universitätsbibliothek
Harvesting Background
BASE (Bielefeld Academic Search Engine)
• started in 2002, active since 2004• 2900 repositories harvested via OAI-PMH • 2337 repositories indexed • 37.4 Mill. documents included • 3.1 Mill. documents automatically classified• Lucene/Solr Index• VuFind end-user GUI
![Page 4: BASE : a powerful search engine for Open Access documents](https://reader031.vdocuments.mx/reader031/viewer/2022013011/55ca32e6bb61ebe2168b4625/html5/thumbnails/4.jpg)
Universitätsbibliothek
Repositories: Geographical Distribution
0.45 m
15,9 m
0.45 m
0.26 m
2,5 m
2.9 m14.0 m
0.45 m
![Page 5: BASE : a powerful search engine for Open Access documents](https://reader031.vdocuments.mx/reader031/viewer/2022013011/55ca32e6bb61ebe2168b4625/html5/thumbnails/5.jpg)
Universitätsbibliothek
BASE search features
• Truncation
• Search History
• Sorting
• Drilldown
• Linguistic Tools
(Stemming, Eurovoc Thesaurus)
![Page 6: BASE : a powerful search engine for Open Access documents](https://reader031.vdocuments.mx/reader031/viewer/2022013011/55ca32e6bb61ebe2168b4625/html5/thumbnails/6.jpg)
Universitätsbibliothek
Repository Typology
• Institutional Repositories (35 %)
• Thesis and Dissertation Server (11 %)
• Subject Repositories (1 %)
• Electronic Journals (21 %)
• Digital Collections (6 %)
• Others (Videos, Audios, Datasets etc.) (2 %)
![Page 7: BASE : a powerful search engine for Open Access documents](https://reader031.vdocuments.mx/reader031/viewer/2022013011/55ca32e6bb61ebe2168b4625/html5/thumbnails/7.jpg)
Universitätsbibliothek
BASE Interfaces
• Query REST interface
• Repository Metadata interface
• Data Delivery Interface (Repository based, DDC of aggregated Metadata) (under construction)
![Page 8: BASE : a powerful search engine for Open Access documents](https://reader031.vdocuments.mx/reader031/viewer/2022013011/55ca32e6bb61ebe2168b4625/html5/thumbnails/8.jpg)
Universitätsbibliothek
Overview
BASE – the OA search engine
Harvesting OAI-PMH and its challenges
Metadata Aggregation and Data Quality
Processing Repositories
![Page 9: BASE : a powerful search engine for Open Access documents](https://reader031.vdocuments.mx/reader031/viewer/2022013011/55ca32e6bb61ebe2168b4625/html5/thumbnails/9.jpg)
Universitätsbibliothek
My Conclusion:
OAI-PMH Harvesting is easy But:
Putting things (results) together is the real challenge
![Page 10: BASE : a powerful search engine for Open Access documents](https://reader031.vdocuments.mx/reader031/viewer/2022013011/55ca32e6bb61ebe2168b4625/html5/thumbnails/10.jpg)
Universitätsbibliothek
Repository does not respond (temporarily, specific verbs) Results are not xml-valid Harvesting breaks (especially big reps) Incremental Harvesting does not work No deleting information, added records Variety of Field Contents Change of behavior (basicurl, contents) Metadata point to reference or citation only Link to Document is not operable Fulltext access is restricted (non OA)
Harvesting : Challenges and pitfalls
![Page 11: BASE : a powerful search engine for Open Access documents](https://reader031.vdocuments.mx/reader031/viewer/2022013011/55ca32e6bb61ebe2168b4625/html5/thumbnails/11.jpg)
Universitätsbibliothek
Overview
BASE – the OA search engine
Harvesting OAI-PMH and its challenges
Metadata Aggregation and Data Quality
Processing Subject Repositories
![Page 12: BASE : a powerful search engine for Open Access documents](https://reader031.vdocuments.mx/reader031/viewer/2022013011/55ca32e6bb61ebe2168b4625/html5/thumbnails/12.jpg)
Universitätsbibliothek
Top values
en – 1385175eng – 511085spa – 345658de – 319937en_GB - 178381ger – 166587eng; - 102678FR – 95798
…l
dc:language: Variety of Metadata Values
Analysis: European Repositories, Oct. 2009804 different values in 4720585 tags
; - 3? - 3at;deu - 2 enm;eng - 2 FRA – 2fr_BE - 2 Andere Sprache – 2cat, spa, fra, eng. - 2
![Page 13: BASE : a powerful search engine for Open Access documents](https://reader031.vdocuments.mx/reader031/viewer/2022013011/55ca32e6bb61ebe2168b4625/html5/thumbnails/13.jpg)
Universitätsbibliothek
Top values
Dataset – 588525Artikel – 192306Rezension – 113924Text – 73210Text.Thesis.Doctoral – 30201Article – 29278Miszelle – 27060NonPeerReviewed – 24688ResearchPaper – 16046Dissertation - 15531
…l
dc:type: Variety of Metadata Values
Analysis: German Repositories, Sept. 20092772 different values in 1394089 tags
Software - 7Kulturkarten - 7Composition - 7Interactive Resource - 4Interview – 3Media - 1content analysis – 1Anniversary Publication – 1qualitative research -1
![Page 14: BASE : a powerful search engine for Open Access documents](https://reader031.vdocuments.mx/reader031/viewer/2022013011/55ca32e6bb61ebe2168b4625/html5/thumbnails/14.jpg)
Universitätsbibliothek
Overview
BASE – the OA search engine
Harvesting OAI-PMH and its challenges
Metadata Aggregation and Data Quality
Processing Subject Repositories
![Page 15: BASE : a powerful search engine for Open Access documents](https://reader031.vdocuments.mx/reader031/viewer/2022013011/55ca32e6bb61ebe2168b4625/html5/thumbnails/15.jpg)
Universitätsbibliothek
Disciplinary repositories http://oad.simmons.edu/oadwiki/Disciplinary_repositories
OpenDOAR
Subject Repositories: Registries
![Page 16: BASE : a powerful search engine for Open Access documents](https://reader031.vdocuments.mx/reader031/viewer/2022013011/55ca32e6bb61ebe2168b4625/html5/thumbnails/16.jpg)
Universitätsbibliothek
The Big Ones:
• arXiv.org (Physics)• CERN Document Server (Physics)• PubMed Central (Life Sciences)• CiteSeer (Computer Science)• ELIS (Library Science)• REPEC (Economics)• EconStor (Economics)• SSOAR (Social Sciences). . .
Subject Repositories in BASE
![Page 17: BASE : a powerful search engine for Open Access documents](https://reader031.vdocuments.mx/reader031/viewer/2022013011/55ca32e6bb61ebe2168b4625/html5/thumbnails/17.jpg)
Universitätsbibliothek
The BASE Approach: Automatic Classification
![Page 18: BASE : a powerful search engine for Open Access documents](https://reader031.vdocuments.mx/reader031/viewer/2022013011/55ca32e6bb61ebe2168b4625/html5/thumbnails/18.jpg)
Universitätsbibliothek
dc:description: 30 to 40 % of metadata records have dc:description with relevant abstract information
Document fulltext (if accessible)
Setspec contains ddc and lcc codes
dc:subject contains lots of subject-orientated information
Contents for Classifier Feed
![Page 19: BASE : a powerful search engine for Open Access documents](https://reader031.vdocuments.mx/reader031/viewer/2022013011/55ca32e6bb61ebe2168b4625/html5/thumbnails/19.jpg)
Universitätsbibliothek
Building the Knowledge Base
![Page 20: BASE : a powerful search engine for Open Access documents](https://reader031.vdocuments.mx/reader031/viewer/2022013011/55ca32e6bb61ebe2168b4625/html5/thumbnails/20.jpg)
Universitätsbibliothek
Mapping of frequently used classifications LCCELIS classificationArXiv classification
DDC codes: ~400.000 Documents = 1,4%
![Page 21: BASE : a powerful search engine for Open Access documents](https://reader031.vdocuments.mx/reader031/viewer/2022013011/55ca32e6bb61ebe2168b4625/html5/thumbnails/21.jpg)
Universitätsbibliothek
DDC classes distribution in Harvesting Results
![Page 22: BASE : a powerful search engine for Open Access documents](https://reader031.vdocuments.mx/reader031/viewer/2022013011/55ca32e6bb61ebe2168b4625/html5/thumbnails/22.jpg)
Universitätsbibliothek
Subject-based Browsing