knowledge management & linguistic pluralism

36
Knowledge Management & Linguistic Pluralism Rajeeva Ratna Shah Secretary Government of India Ministry of Communications & Information Technology Department of Information Technology [email protected] [email protected]

Upload: fawn

Post on 03-Feb-2016

50 views

Category:

Documents


0 download

DESCRIPTION

Knowledge Management & Linguistic Pluralism Rajeeva Ratna Shah Secretary Government of India Ministry of Communications & Information Technology Department of Information Technology [email protected]. A CASE OF COMMUNICATION GAP. Wing Commander to Squadron Leader - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Knowledge Management  & Linguistic Pluralism

Knowledge Management &

Linguistic Pluralism

Rajeeva Ratna ShahSecretary

Government of IndiaMinistry of Communications & Information Technology

Department of Information Technology [email protected]@mit.gov.in

Page 2: Knowledge Management  & Linguistic Pluralism

A CASE OF COMMUNICATION GAP

Page 3: Knowledge Management  & Linguistic Pluralism

Wing Commander to Squadron Leader

At 9 O'clock tomorrow there will be an eclipse of the Sun, something which does not occur every day. Get the men to fall out in the Lal Bahadur Shastri Marg in their uniform so that they will see this rare phenomenon, and I will explain it to them. In case of rain, we will not be able to see anything, then take the men to the gymkhana.

Page 4: Knowledge Management  & Linguistic Pluralism

Squadron Leader to Flying Officer

By order of the Wing Commander, tomorrow

at 9 O'clock there will be an eclipse of the

Sun, if it rains you will not be able to see it

from the Lal Bahadur Shastry Marg, So then

in uniform, the eclipse of the Sun will take

place in gymkhana, something that does not

occur every day.

Page 5: Knowledge Management  & Linguistic Pluralism

The Flying Officer to SergeantBy order of the Wing Commander in uniform tomorrow at 9 O'clock in the morning, the inauguration of the eclipse of the Sun will take place in the gymkhana. The Wing Commander will give the order if it should rain, something, which occurs everyday.

Sergeant to CorporalTomorrow at nine the Wing Commander in uniform will eclipse the sun in the gymkhana; as it occurs every day, if it is a nice day; if it rains, then in the Lal Bahadur Shastri Marg.

Page 6: Knowledge Management  & Linguistic Pluralism

Corporal To Lance Corporal

Tomorrow at nine the eclipse of the Wing Commander in uniform will take place because of the Sun. If it rains in the gymkhana, something which does not take place every day, you will fall out in the Lal Bahadur Shastri Marg.

COMMENTS AMONG ALL IN THE UNIT

Tomorrow, if it rains, it looks as if the sun will eclipse the Wing Commander in the gymkhana. It is a shame that this does not occur every day.

Page 7: Knowledge Management  & Linguistic Pluralism

The Broadening sphere of Information Technology

Cognition

INFORMATIONDATA KNOWLEDGE

Computation

Communication

Page 8: Knowledge Management  & Linguistic Pluralism

Old EconomyCapitalist Society

(Legacy System)

New EconomyInformation Society (Knowledge Society)

Core: Competition is the key since capital is a limited and scarce resource

Core: Collaboration and sharing is the key since knowledge is inexhaustible

Capital diminishes with sharing

Knowledge increases with sharing

Capital investments are one time and subject to low obsolescence

Knowledge investments need continuous up-gradation and have high obsolescence

Page 9: Knowledge Management  & Linguistic Pluralism

Knowledge of the 21st CenturySTHULA-JAGATSTHULA-JAGAT SOOKSMA-JAGAT SOOKSMA-JAGATMacrocosmMacrocosm Microcosm Microcosm

ATOMS

NANOTECH

Building Blocks & Knowledge Tools of 21st Century

NEURONS

NETWORKS

BITS

COMPUTERS

GENES

BIOTECH

Page 10: Knowledge Management  & Linguistic Pluralism

Erosion of Knowledge base due to loss of language

Technologies – transformations in the Societies – Increase in Knowledge. BUT……..

From an estimated 10,000 language in 1900, the world has about 6,700 languages surviving today. 33% in Asia & 19% in Pacific Only 50 percent of those surviving ones are being

taught to children. Half the current languages will be effectively

extinct within a single generation.

Is there gain in knowledge or loss of Knowledge?

Page 11: Knowledge Management  & Linguistic Pluralism

Sprawling digital divide

Rough sketch of global digital –divide among script

Latin Alphabet users :39 % of the global population enjoy 84% of access to the Internet

Hanzi-Chinese-Ideograph users in China/ Japan/ Korea:22% in global population enjoy 13% of Internet access

Arabic script users:9% of the population have 1.2 % of the Internet Access

Indic scripts users:occupy 22 % of the World population have just 0.3 % of Internet

access.

Is the technology to divide or to unite?

Page 12: Knowledge Management  & Linguistic Pluralism

Exponential Growth Trends in Computer Performance

102400

100

200

400

800

1600

3200

6400

12800

25600

51200

1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014

Year

MIPS

Giga PC

10G PC

2015 2016 2017 2018 2019 2020

204800

409600

819200

1638400 Tera PC

100G PC

Doubling every 15 months

Doubling every 2 years

Page 13: Knowledge Management  & Linguistic Pluralism

Future Direction : Information Interspace

• Third wave in the ongoing evolution of the Global Information Infrastructure

• Computing technology will transform the Internet into Interspace.

• In future the Information Infrastructure will support semantic indexing and concept navigation across widely distributed community repository.

• Concept Navigation will become standard function in the Interspace

E-mail in ARPANET

(1965-85)

Document Browsing in INTERNET

(1985-2000)

Concept Navigation in INTERSPACE

(2000-10)

Page 14: Knowledge Management  & Linguistic Pluralism

Script (10) Language (18)Devanagari Sanskrit, Hindi, Marathi, Nepali, Sindhi

& Konkani

Bengali Bangla, Assamese, Manipuri

Oriya Oriya

Gujarati Gujarati

Gurumukhi Punjabi

Telugu Telugu

Kannada Kannada

Tamil Tamil

Malayalam Malayalam

Urdu Urdu , Kashir

Linguistic Pluralism in India

Eighteen constitutional Indian Languages & their scripts

Page 15: Knowledge Management  & Linguistic Pluralism

Language-wise world PopulationLanguage 2050

Population in Billion1996

Population in Billion

Chinese 1.384 1.113

Hindi/Urdu 0.556 0.316

English 0.508 0.372

Spanish 0.486 0.304

Arabic 0.482 0.201

Portuguese 0.248 0.165

Bengali 0.229 0.125

Russian 0.132 0.155

Japanese 0.108 0.123

German 0.091 0.102

Malay 0.080 0.047

French 0.076 0.070

Page 16: Knowledge Management  & Linguistic Pluralism

0.630.630.810.810.580.583.383.3815.415.43.53.52390239010271027IndiaIndia

11.1711.170.690.691.931.9313.8113.8137.937.94.94.93940394012611261ChinaChina

60.5360.53132.94132.9433.7033.7057.3557.351706.61706.63.83.825000250005959FranceFrance

68.2968.29294.58294.5833.6033.6063.4863.481699.91699.94.14.125010250108282GermanyGermany

USAUSA

CountryCountry

282282

Population Population in millionin million

3426034260

PPPPPP

5.25.2

IT/GDP in IT/GDP in percentpercent

44.4244.423714.013714.0162.2562.2566.4566.452792.12792.1

Mobile Mobile phones phones users per users per 100 100 PersonsPersons

Internet Internet Host per Host per 10,000 10,000 PersonsPersons

PC PC PenetratioPenetration per 100 n per 100 personspersons

Tel. Tel. Density Density Tel. per Tel. per 100 100 personspersons

IT per IT per capita capita Nominal Nominal US$US$

Data on Information Technology Indicators

Source: ITU-2001 and IMF world economic review 2001.

Page 17: Knowledge Management  & Linguistic Pluralism

Language Technology Mission

Vision : Digital unite and knowledge for all.

Mission: Communicating without language barrier & moving up the knowledge chain.

Objectives:

• To develop information processing tools to facilitate human machine interaction in Indian languages and to create and access multilingual knowledge resources/content.

• To promote the use of information processing tools for language studies and research.

• To consolidate technologies thus developed for Indian languages and integrate these to develop innovative user products and services.

Page 18: Knowledge Management  & Linguistic Pluralism

Major Initiatives1. Knowledge Resources (Parallel Corpora, Multilingual Libraries/Dictionaries, lexical

resources)2. Knowledge Tools (Portals, Language Processing Tools, Translation Memory

Tools)3. Translation Support Systems

(Machine Translation, Multilingual Information Access, Cross Language Information Retrieval)

4. Human Machine Interface System (OCR, Voice Recognition Systems, Text-to-Speech System)

5. Localization (Adapting IT Tools and solutions in Indian Languages) 6. Language Technology Human Resource Development

(in NLP & Computational Linguistics)7. Standardization (ISCII, Unicode, XML, INSFOC, MPEG, Terminology, etc.)

Page 19: Knowledge Management  & Linguistic Pluralism

Industry Involvement Through CoIL-tech

To catalyze the Language Technology

innovation and productization in industry

and to foster interaction with academia,

MAIT has nucleated a consortium

named Consortium on Innovation &

Language Technology (COILTech) with

members from industry and research

organizations.

Page 20: Knowledge Management  & Linguistic Pluralism

Major Achievements of TDIL Programme of DIT OCRs Developed• Hindi • Marathi • Bangla • Tamil • Telugu • Punjabi(with 97% accuracy)

OCRs under Development• Gujarati• Assamese• Oriya• Malayalam

1. Hindi 2. Marathi 3. Bangla 4. Tamil (Spell checkers Developed) 5. Telugu 6. Punjabi 7. Malayalam

Page 21: Knowledge Management  & Linguistic Pluralism

Machine Aided Translation System (MAT)• The Anglabharati MAT Technology with high accuracy has been

developed by IIT Kanpur • Text-to-Speech integrated with MAT system has also been

demonstrated• On-line MAT system can be accessed on the web at:

www.anglahindi.iitk.ac.in

Speech Recognition

• Continuous Speech Recognition System for Hindi is being developed by IBM Research Lab India.

Parallel Corpora

• Development of One Million pages Parallel Corpora (Gyan-Nidhi) for knowledge Repository has been undertaken.

• The Parallel Corpora can act as a test-bed for the OCR and EBMAT (Example Based Machine Aided Translation) systems.

Page 22: Knowledge Management  & Linguistic Pluralism

Language Technology Products in Public DomainFor widespread proliferation, a number of the freely downloadable softwares are available on the TDIL web-site: http://tdil.mit.gov.in. These include fonts with Keyboard drivers, e-mail client, bilingual Word processors, Glossaries, Corpora and Classic contents.

Open Source Software INDIX (Indian Language Interface) supports Indian languages on Linux. This will ensure affordability of IL software based on Linux. Open Source Software approach will ensure faster localization and low cost software.

Page 23: Knowledge Management  & Linguistic Pluralism

Standardisation • Standardization of 8 bit ISCII (Indian Script Standard Code

for Information Interchange) was developed in 1988 & is a subset of the Unicode

• DIT (Govt. of India) is a voting member of the Unicode consortium

• Feedback on revision of UNICODE 3.0 for all Indian languages has been finalised

• International UNICODE Conference 2003 in India Proposed

• Draft Standard for - • Display codes in the form of INSFOC (Indian Standard

for Font Code) is ready• Indian Script to Roman Transliteration (INSROT) is

ready• Multi–lingual lexical format has also been proposed

Page 24: Knowledge Management  & Linguistic Pluralism

TOMORROWS TOOLS:

PDS for ANMs, water, power, schools, crafts, GIS

WORLD COMPUTER:

Low cost computing devices

Linux CE, Village Interfaces, Village Info

Systems

BITS FOR ALL:

Wi-Fi nets, DakNet

DIGITALVILLAGE:Community ConnectionVillage Voice

Media Lab Asia Programme - Major Project Areas

Page 25: Knowledge Management  & Linguistic Pluralism

Low cost Computing Devices - Choice of Technologies

High-bandwidth option: High-bandwidth option: IEEE 802.11B/802.11AIEEE 802.11B/802.11A

Typical transfer rates:Typical transfer rates: 11 Mbps @180m11 Mbps @180m 1 Mbps @500m1 Mbps @500m

Prices still falling:Prices still falling: Access point, <US$180Access point, <US$180 Transceiver, <US$80Transceiver, <US$80

Peer-to-peer supportedPeer-to-peer supported

Page 26: Knowledge Management  & Linguistic Pluralism

Low Cost Computing Devices

Ruggedized terminals Ruggedized terminals with medium with medium functionality and low functionality and low cost < US$100 cost < US$100

(also has smart card port and musical keyboard)

Page 27: Knowledge Management  & Linguistic Pluralism

E-Learning E-Learning

Vidya Vahini Gyan Vahini

Page 28: Knowledge Management  & Linguistic Pluralism

Proposed Setup for Vidya VahiniProposed Setup for Vidya Vahini

INTERNET

INTERNET

UPS

LAN HUB

LANPRINTER

COMPUTER LAB

ROUTERSERVER

SERVER

TV

Page 29: Knowledge Management  & Linguistic Pluralism

Pilot Project – Vidya Vahini

200 schools in select districts Systems at school:-

One Server (P-IV based), 256 MB memory, 40 GB Hard Disk Drive

Network Printer Multimedia Personal Computer with Web Camera Colour TV 27”/29” 2 KVA UPS with 50/30 minutes power back-up Software (MS Office full suite, Education software with

Multi-lingual support, Course Curriculum software, Filtering software and School Administration Software)

Internet access of 128 Kbps – to be increased gradually

Page 30: Knowledge Management  & Linguistic Pluralism

Technology Class Room A classroom in every school will be converted into

a Technology Class Room. The Technology Room will have

29” Flat-Screen TV connected with a PC which will further be connected to the Server

Computer-aided techniques will be used to impart teaching basic course curriculum

Vidya Vahini Schools as Anchor Schools

• Using VSAT based Internet Connectivity (8 mbps)

• Using a transceiver, dissemination upto 11 mbps of bandwidth in a radius of 4 to 5 Kms

Page 31: Knowledge Management  & Linguistic Pluralism

Training - Teacher Empowerment The Teacher Empowerment Programme forms the heart of The Teacher Empowerment Programme forms the heart of

“Vidya Vahini”. The programme covers training of teachers in:“Vidya Vahini”. The programme covers training of teachers in: Use of Computers Effective Teaching Techniques Creating Lessons Building Teaching Tools Usage of Technology in class rooms Training on Educational Software

7 Computer Labs equipped with 1 Server, Printer, 10 PCs, 7 Computer Labs equipped with 1 Server, Printer, 10 PCs, TV, Educational software tools are proposed in collaboration TV, Educational software tools are proposed in collaboration with Industry in the Pilot Projectwith Industry in the Pilot Project

Page 32: Knowledge Management  & Linguistic Pluralism

Knowledge PortalA Knowledge Portal will be hosted which haveA Knowledge Portal will be hosted which have

Education material Programming Tools Software tools for teachers Software tools for students Language tools Filtering Software CBSE Course curriculum Web pages of all schools Circulars/Notices/directives issued by the Central Board of

Secondary Education and different Boards throughout the country

Students will be able to access, harness and manage Students will be able to access, harness and manage knowledge through the Portalknowledge through the Portal

Page 33: Knowledge Management  & Linguistic Pluralism

Gyan Vahini

Phase IPhase I Set up IT infrastructure and connect all Govt. Set up IT infrastructure and connect all Govt.

funded Universities (including Deemed funded Universities (including Deemed Universities), Engineering Colleges and Medical Universities), Engineering Colleges and Medical Colleges in the countryColleges in the country

Phase II Phase II Set up IT infrastructure and connect all Set up IT infrastructure and connect all

Polytechnics, Degree and Dental Colleges across Polytechnics, Degree and Dental Colleges across the countrythe country

Page 34: Knowledge Management  & Linguistic Pluralism

Typical Campus Wide Network for High End Institutions

INTERNETINTERNET

Router Cum RAS

Central Switch

Administrative Block

Dif

fere

nt

Stu

de

nt

Ho

ste

ls

Res

iden

tial

Qu

arte

rs

Fibre Optic Cable

CAT 5 cabling

Hostel Block

Var

iou

s D

epar

tmen

ts

Academic Block

Existing PBX

Computers

Internet and LAN Servers

Cat 5 CableFibre Optic Cable

Switch

Switch

Switch

100Mbps

SwitchSwitch

Switch

Switch

Switch

Computers

Com

pu

sers

Com

pu

sers

Computers

Computers

Computers

Com

pu

sers

Com

pu

sers

Switch

Switch

Page 35: Knowledge Management  & Linguistic Pluralism

New Initiatives under Consideration

1. e – Content (including Digital Library)

2. Speech- to- Speech translation

3. Open source software

Page 36: Knowledge Management  & Linguistic Pluralism