indian language tools & resources

Download Indian Language Tools & Resources

Post on 06-Oct-2015




0 download

Embed Size (px)


This presentation file will explain some very useful tools for learning indian languages.


Indian Language Tools & Resources

Indian Language Tools & ResourcesIntroduction to newly added topics in Subject No. 204 for Recently Joined P.G.D.C.A Lecturers.

Date : 26th November, 2014IntroductionThis Subject is about Online computer tools which are useful for Indian Languages.

As well as there is information about some websites which can be very useful for learning & understanding Indian languages.TransliterationTransliteration & Translation both are different words having different meanings.Transliterations means changing one script into another script without changing its pronunciation. i.e. Raam = Here only written script is changed but the pronunciation remains same thats called transliteration.

Transliteration ToolsGoogle Transliteration available in following link of Sanskrit studies, University of Hyderabad, Transliteration tool. language Transliteration developed by Special Centre for Sanskrit Studies, Jawaharlal Nehru University, New Delhi.

Xlit is a transliteration tool to convert words from English to Indian languages and back, without losing the phonetic characteristics. DictionariesWe all know what dictionaries are but its tuff to find Indian language dictionaries. is one of best portal for online dictionaries.

Web-site for South Asia Language dictionaries is.. is an editable online hypertext Sanskrit-English-Sanskrit dictionary containing words, phrases, and sentences of the Sanskrit language with special emphasis on spoken Sanskrit.Another one is. Cologne Digital Sanskrit Dictionaries.

Hindi to English Dictionarywww.shabdkosh.comWord-NetWordNet is a lexical database.English WordNet is the first WordNet and developed by The Cognitive Science Laboratory of Princeton University under the direction of George A. Miller. Christiane Fellbaum, Randee Tengi and several others. Word-Net

Sanskrit Word-Net

DhaaturatnakarahDhaturatnakara is a prestigious project of Rashtriya Sanskrit Sansthan (Deemed University), New Delhi , approved by Ministry of Human Resource Development, Government of India. Cross-language information retrieval (CLIR)is a subfield of information retrieval dealing with retrieving information written in a language different from the language of the user's query. Dictionary-based CLIR techniques Parallel corpora based CLIR techniques Comparable corpora based CLIR techniques Machine translator based CLIR techniquesCorpora In linguistics, a corpus (plural corpora) or text corpus is a large and structured set of texts, now-a-days usually electronically stored and processed. They are used to do statistical analysis and hypothesis testing, checking occurrences or validating linguistic rules on a specific universe.Digital Library of IndiaDigital Library of India, part of the online services of the Indian Institute of Science, Bangalore and partner in the Million Book Project, provides free access to many books in English and Indian languages.

Project Gutenberg

Project Gutenberg is the first and largest single collection of free electronic books, or eBooks. Michael Hart, founder of Project Gutenberg, invented eBooks in 1971 and continues to inspire the creation of eBooks and related technologies today. Sanskrit is a website which hosts various the Sanskrit books in pdf format for free download. Sanskrit librarySanskrit library is a digital library dedicated to enhancing online access to the cultural heritage of India by facilitating education and research in Sanskrit, one of the worlds richest culture-bearing languages. Language ProcessingThe Sandhi is a common function in the grammar which is used to join two Sanskrit words together following the Paninian sutras. Sandhi-splitter is a computational tool which splits a given word into morphologically valid segments.

Morphological Analyser analyses a Sanskrit word giving its nominal stem () / verbal root () along with its various linguistic features such as Lexical-category (), Gender (), Number (), Case (), , Person (), etc. Morphological Generator generates the Sanskrit word forms automatically based on nominal stem () / verbal root () along with its various linguistic features such as Lexical-category (), Gender (), Number (), Case (), ,Person () etc. ParserIt is a computational tool which takes a Sanskrit sentence as input and produces different semantic (kraka) relations between the words such as kart, karma, karaa, adhikaraa etc. Machine TranslationThe interpretation of a natural language to another natural language in the form of text or speech is known as translation. When the translation is performed by a machine (computer) automatically is called machine translation.Sampark System : Automated Translation among Indian Languages. Sanskrit Hindi AccessorUseful not only for translation from Sanskrit to Hindi but also useful for knowing the grammatical information of the sentence. Speech ProcessingNow-a-days, number of institutions are working on processing in all over the world. Speech processing is the study of speech signals and the processing methods of these signals. The signals are usually processed in a digital representation, so speech processing can be regarded as a special case of digital signal processing, applied to speech signal. Aspects of speech processing includes the acquisition, manipulation, storage, transfer and output of digital speech signals.

Three main applications of speech processing are on following page.1. Speech recognition : It deals with analysis of the linguistic content of a speech signal and its conversion into a computer-readable format. The aim of this application is to recognize the identity of the speaker.

2. Speech synthesis : The artificial synthesis of speech, which usually means computer-generated speech. Advances in this area improve the computer's usability for the visually impaired.

3. Speech compression : It is important in the telecommunications area for increasing the amount of information which can be transferred, stored, or heard, for a given set of time and space constraints.Thank You