digitization practices in india: issues and challenges

47
Digitization Practices in India: Issues and Challenges V.N. Shukla

Upload: duonghuong

Post on 28-Jan-2017

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Digitization Practices in India: Issues and Challenges

Digitization Practices in India: Issues and

Challenges

V.N. Shukla

Page 2: Digitization Practices in India: Issues and Challenges

2

C-DAC, NOIDA UNITC-DAC, NOIDA UNIT

MISSION MISSION C-DACC-DAC

NATURAL LANGUAGE PROCESSING AND

INTERFACES

HUMAN RESOURCE DEVELOPMENT IN

HITECH AREAS

INFRASTRUCTURE AND SUPPORT

SERVICES

SPECIAL INDUSTRIAL

APPLICATIONS

Page 3: Digitization Practices in India: Issues and Challenges

3

AREAS OF COMPETENCEAREAS OF COMPETENCE

Graphical Display System

Security Systems

Embedded System

System Engineering and Consultancy

NLP

Solar Energy System

E-Governance

Internet on CATV & E-Commerce

.

.

.

NOIDA

Page 4: Digitization Practices in India: Issues and Challenges

•Digital Library Projects

•Mega Centre for Digital Library•Mobile Digital Library : Dware Dware Gyan Sampada•Digital Library at President’s House•Digital Library at Nagari Pracharini Sabha Varanasi•Digital Library at Uttaranchal•GyanNidhi : Multilingual Parallel Corpus in Indian Languages•Digital Library at Gujrat Vidyapeeth ,Ahmedabad•Digitization of Libraries

Digital Library Activities : CDAC Noida

Page 5: Digitization Practices in India: Issues and Challenges

Digital Library Mission

Online ContentBillions of web pages

Offline ContentBillions of items still unindexed

To organize the information and make it universally accessible and useful.

Page 6: Digitization Practices in India: Issues and Challenges

DL Initiatives

~85% of books are out of print and/or out of copyright – these books are only found in libraries

GOAL: Create a comprehensive virtual card catalog of all books in all languages, while respecting publishers’ rights

Only ~15% of books are in print

Source: Google

Page 7: Digitization Practices in India: Issues and Challenges

Metadata Search

DL creation & processes

Users

Traditional Libraries

Digital Libraries

I NDEX

Index

Hyperlinks

Page 8: Digitization Practices in India: Issues and Challenges

92% of the world's books are neither generating revenue for the copyright holder nor easily accessible to potential readers.*

The value is in the middle

A Typical Library Collection

In-Print Public DomainUnclear copyright status• May be in copyright, but not for sale • Rights may have reverted to author• May be in the public domain

Less than 20%**~65% or more15%

*Source:  Covey, Denise Troll.  "Global Cooperation for Global Access:  The Million Book Project“**OCLC analysis of the Google Books Library Project: http://www.dlib.org/dlib/september05/lavoie/09lavoie.html   

~15%

Page 9: Digitization Practices in India: Issues and Challenges

Digital Library (DL) may be seen as “Collection of intelligent creations by human beings through their own language and culture. It also reflects cultural heritage besides providing archive and generating many research issues pertaining to Natural Language Processing”

Page 10: Digitization Practices in India: Issues and Challenges

According to other definition Digital libraries are

“Organizations that provide the resources, including the specialized staff, to select, structure, offer intellectual access to, interpret, distribute, preserve the integrity of, and ensure the persistence over time of collections of digital works so that they are readily available for use by a defined community or set of communities”.

Digital Library ?

Sun Microsystems defines a digital library as the electronic extension of functions users typically perform and the resources they access in a traditional library.

These information resources can be translated into digital form, stored in multimedia repositories, and made available through Web-based services.

Page 11: Digitization Practices in India: Issues and Challenges

What is Digital library ?

A Service? An Architecture? A set of Information Resources? A set of tools to locate, search, retrieve

information? Possibly the tools to create such resources and

services also fall within the purview of DLs Digital face of traditional libraries Include both digital collections and traditional Backbone and nervous system of libraries.

Page 12: Digitization Practices in India: Issues and Challenges

•Efficient & qualitative services by collecting, organizing, storing, disseminating, retrieving and preserving the information.

•Preservation benefits besides making information retrieval & delivery more comfortable.

•Online access to historical and cultural documents whose existence is endangered due to physical decay.

Digital libraries necessarily include a strong focus on the management of digital content, just as traditional libraries have focused for long on the management of content in physical forms.

Digital library Vs traditional libraryDigital library Vs traditional library

Page 13: Digitization Practices in India: Issues and Challenges

The major areas for great exploitation are:

• Information retrieval, • multimedia,• database, • data mining, • data warehouse, • on-line information repositories, • image processing, hypertext, • World Wide Web and wide area information services (WAIS).

Most of the digital content that is being managed includes:

• Human Language, in various forms character-coded electronic text, scanned images, printed or handwritten text or human speech.

• Language technology helps in managing digital content

• Management through learning from past experience also adds to manage content

Digital Content ManagementDigital Content Management

Page 14: Digitization Practices in India: Issues and Challenges

• Access anywhere

• Reducing delays

• Distributed storage – central access

• Better cataloguing • Cross references to other documents

• Full text search

• Protected information source • Wide exploration and exploitation of the information

Few advantages of digital libraries

The information explosion, the wide bandwidth data networks and the potential The information explosion, the wide bandwidth data networks and the potential of Internet-based technologies - such as the Web - make digital libraries one of of Internet-based technologies - such as the Web - make digital libraries one of the important application areas of computer science.the important application areas of computer science.

Page 15: Digitization Practices in India: Issues and Challenges

Process of Digital Preservation

Centralized Server

Book scanning status

XML Meta File Creation using

Dublin core Std.

Scanned Image in TIFF

format

S/w to divide even & odd

pages

Batch cropping & Cleaning

OCRConversion to

TXT/RTF/HTML

Yes

No

Uploading

Reject the Book

Page 16: Digitization Practices in India: Issues and Challenges

Goals of DL

Focused on digitization technology, metadata schemes, data management techniques, and digital preservation.

Second-generation digital library exploring new opportunities and developing new

competencies. Third-generation digital library

focusing instead on fully integrating digital material into the library’s collections through a modular systems architecture.

Page 17: Digitization Practices in India: Issues and Challenges

Ingredients for DLs

Hardware The minimum machinery to do the job

Software The programs for handling data

Digital Objects Articles, Conference Papers, Thesis,…… Basic Skills

Things one has to learn

Page 18: Digitization Practices in India: Issues and Challenges

Hardware

A Server You’ll need access to a web server

A good PC Scanners Flatbed – Auto feed, Back to back MF

Book Scanner

Page 19: Digitization Practices in India: Issues and Challenges

Software

Open Source Software (OSS) Dspace, E-Prints, Fedora, GSDL……

Proprietary software you can’t avoid Image Editing and Optical Character Recognition Software

have to be purchased

Page 20: Digitization Practices in India: Issues and Challenges

Content is King

The information content is more important than the systems used for its storage, management and retrieval

Objects should not be “locked” in specific DLs or archives

Page 21: Digitization Practices in India: Issues and Challenges

Creating DLs …

Six steps Selecting Acquiring Digitization Creation Of Meta Data Organizing Archiving Providing Access

Page 22: Digitization Practices in India: Issues and Challenges
Page 23: Digitization Practices in India: Issues and Challenges

Possible Delivery Formats

Pure image formats: TIFF, JPEG Open encoded formats: XML, HTML, ASCII, and

Unicode Hybrid formats: PDF, DjVu – can contain both image and

text Proprietary formats: Microsoft Word, WordPerfect

Page 24: Digitization Practices in India: Issues and Challenges

Digitization: Issues

Copyright Access copy and archive copy File size Storage media( CD, Hard disc…) File format ( TIFF,JPEG…)

Page 25: Digitization Practices in India: Issues and Challenges

25

Challenges in Digitization

Building digital collections of national importance from existing texts, documents, images . . .

Creating new digital documents & linking them

Subject portals: Selecting and maintaining open source digital resources

Developing / adapting management tools for digital collections

Providing access to digital collections

Page 26: Digitization Practices in India: Issues and Challenges

26

Challenges..

Integrating digital & other library collections

incl. integration of OPACs, subscribed e-resources and subject portals

Establishing services for digital libraries

online access & offline support education & training of users and librarians

Addressing social, legal, policy issues

Page 27: Digitization Practices in India: Issues and Challenges

Challenges in Publishing

Preservation of layout

Searchability of content and metadata

Efficient image compression

Easy browsing of books

Accommodating low bandwidth user

Multilingual text support

Multipaging

Page 28: Digitization Practices in India: Issues and Challenges

Digital Library Support in India

Funding Ministry of Communication & Information Technology

(MIT) Ministry of Human Resource Development (MHRD) Manuscript Mission of India Department of Scientific & Industrial Research (DSIR-

TRP) All India Council for Technical Education (AICTE) University Grants Commission (UGC)

Page 29: Digitization Practices in India: Issues and Challenges

29

Library Consortium in India Scholarly Science Journals Theses & Dissertations Institutional E-Print Archives Books (out of copyright) Manuscripts Newspapers Online Courseware Open Access at Metadata Level Portal and Gateway Services

Digital Library Initiatives in India

Page 30: Digitization Practices in India: Issues and Challenges

Government of India

Min. of C&IT Min of Culture

INDEST-AICTE Consortium

Others

CSIR E-Journals Consortium

UGC Infonet Consortium

FORSA Consortium

National Manuscript Library

Universal Digital

Library

IIM Libraries Consortium

Page 31: Digitization Practices in India: Issues and Challenges

Digital Library of India Digital Library of India

Participating centers of DLI

IISc

IIIT-H State & CityCentral LibraryUniversity of Hyderabad

MIDC Pune University

AKCE

SASTRAASR Melkote

Sringeri Mutt

Anna University

TTD Tirupati

IIIT-Allahabad

CDAC Noida

Rashtrapathi Bhavan

Mega Scanning Centres atIIITH, IIITA

CDAC- Noida and Kolkatta

PTU-1PTU-2PTU-3

Goa University

Kanchi MuttIISc, IIAP,

PoornaPragya

CDAC Kolkata

ERNET

Page 32: Digitization Practices in India: Issues and Challenges

Digital Library Initiatives in India

Some Examples

Page 33: Digitization Practices in India: Issues and Challenges

April 20, 2009 Workshop on Institutional Repositories 33

Digital Library of India

http://www.dli.ernet.in/

Page 34: Digitization Practices in India: Issues and Challenges
Page 35: Digitization Practices in India: Issues and Challenges

April 20, 2009 Workshop on Institutional Repositories 35

http://www.ias.ac.in/

Page 36: Digitization Practices in India: Issues and Challenges

April 20, 2009 Workshop on Institutional Repositories 36

http://www.insa.ac.in/

Page 37: Digitization Practices in India: Issues and Challenges

April 20, 2009 Workshop on Institutional Repositories 37

http://medind.nic.in/

Page 38: Digitization Practices in India: Issues and Challenges

April 20, 2009 Workshop on Institutional Repositories 38

Page 39: Digitization Practices in India: Issues and Challenges

39

Page 40: Digitization Practices in India: Issues and Challenges
Page 41: Digitization Practices in India: Issues and Challenges

Manuscripts India has the largest collection of manuscripts in the world (5 million

Approximately).

India is the repository of an astounding wealth of ancient knowledge belonging to different periods of history, going back to thousands of years. Most of this knowledge belonging to different areas of intellectual activity such as religion, philosophy, science, arts and literature is preserved in the form of manuscripts. Composed in different Indian languages and scripts, they are preserved in materials such as birch bark, palm leaf, cloth, wood, stone and paper.

National Manuscript Mission was launched five-year programme in Feb., 2003 by the Ministry of Human Resource Development, Govt. of India to get all the manuscripts and conserve them.

Page 42: Digitization Practices in India: Issues and Challenges

http://namami.nic.in/

Page 43: Digitization Practices in India: Issues and Challenges

43

Archives of Indian Labour V.V. Giri National Labour Institute

Heritage of Indian Working Class

Commissions on Labour

Oral History Collections

Trade Union Collections

Regional Collections

Strike Collections

Powered by Green Stone Digital Library

http://www.indialabourarchives.org/

Page 44: Digitization Practices in India: Issues and Challenges

Digital Libraries Benefits : Individual

Gain access to the holdings of libraries worldwide through automated catalogs. Locate both physical and digitized versions of scholarly articles and books.

Optimize searches, simultaneously search the Internet, commercial databases, and library collections.

Save search results and conduct additional processing to narrow or qualify results.

From search results, click through to access the digitized content or locate additional items of interest.

All of these capabilities are available from the desktop or other Web-enabled device such as a personal digital assistant or cellular telephone.

Page 45: Digitization Practices in India: Issues and Challenges

Conclusion Digital Libraries are redefining the role of libraries in society

& the role of librarians & information specialists

National level mechanism is essential to promote and coordinate open access and public domain digital library systems

Improve awareness of open access Regular training – tools, processes, standards Support setting up of working models, services National Resource Centre for open access publishing

International agencies like UNESCO, ICSU, ICSTI, CODATA need to actively promote and support developing country initiatives

Page 46: Digitization Practices in India: Issues and Challenges

References

Digitization Of Library Forum Survey 2010. IT Act . Available at www.mit.gov.in/it-bill.htm.

A digital library for education: the PEN-DOR project. The Electronic Library, 17(2), 75-82.

Government of India. 2000. “Background Report on IT for Masses” itformasses.nic.in/vsitformasses/page1.htm

Government of India. 2000. IT for the Common Man: The Millenium IT Policy. Department of Information.

Page 47: Digitization Practices in India: Issues and Challenges

Thank You