1 2009 year of science. 2 koycho mitev digital literacy
TRANSCRIPT
1
2009
Year of science
2
Koycho MitevDIGITAL LITERACY
3
INTERNTIONAL STANDARDS
Mankind has created a number of internationally coordinated systems for presenting a common description of various aspects and objects of knowledge – the international system of weights and measures, the system of presenting chemical elements with letters from the Latin alphabet, the music letter, the barcode system for recognizing products in shops, and many others.
4
THE NOTES
5
THE BARCODE SYSTEM
US Patent #2,612,994- issued to inventors Joseph Woodland
and Bernard Silver on October 7, 1952
6
THE PERIODIC SYSTEM OF CHEMICAL ELEMENTS
Dmitrii Mendeleev
7
QUESTION
Is it possible that there exists AN INTERNATIONAL STANDARD FOR DIGITAL
COMMUNICATION IN MOTHER TONGUE, where the actual interpreter of the voice and
written speech is the computer?
8
You speak in your mother tongue, and someone at the other side of the world
hears your voice but in his/her own language!? .
9
Centuries birthday of JOHN ATANASOV
Acad. Sendov http://www.aba.government.bg/bg/BGpoSveta/Kariera/021203.html John Atanassov was very much interested in the most important means of communication – the human language and its written equivalent. It can be said that written languages, when compared to the spoken ones, are a symbolic presentation of the latter. This representation does not have a single meaning and therefore can have a different quality. When John Atanassov was awarded the “Cyril and Methodius” medal, something he had not expected, he showed he was very well informed about the work of the brothers Cyril and Methodius. This was due to the fact that he was interested in different scripts. He also complained about the high percentage of illiteracy in the USA and explained that fact with the imperfections of written English. He also considered the Cyrillic alphabet to be more felicitous.
This motivated him to create a new script which would be entirely phonetic and suitable for both people and machines. He did not fulfill his dream although he aspired after it till the end of his life.
10
SCIENTIFIC DISCOVERY DIGITAL SCRIPT
A scientific discovery must meet simultaneously three requirements :
А. CAUSE
B. EFFECT
C. CAUSE-EFFECT RELATIONSHIP
11
CAUSES - 1 The topic of communication all over the
world is one and the same – work, money, sport, love, business, education, culture, etc.
12
CAUSES - 2The organs of speech of human beings are the same for all ethnic, racial, and religious
groups.
13
CAUSES - 3
The digit 10 hides secrets !!!The parts of speech in all languages are exactly 10 in number:
noun, adjective, numeral, pronoun, verb, conjunction, preposition, particle, and interjection.
14
EFFECT
With the help of the digits from the decimal system we can transform
communicative spoken and written speech from a random language or
dialect into any other language.
15
CAUSE-EFFECT RELATIONSHIP - 1Undeniable scientific facts:
• The digits from the decimal system:
0 1 2 3 4 5 6 7 8 9 are 10 in number and the initial digital codes of the 10 parts of speech can be simply coded with their help.
• Regardless of its language origin, the sentence is a basic element of speech and is characterized by semantic and intonation unity with communicative importance.
16
CAUSE-EFFECT RELATIONSHIP - 2
• The digits and punctuation symbols on the computer keyboard are common for all languages in the world.
17
CAUSE-EFFECT RELATIONSHIP - 3
• The grammar of all languages consists of the same elements:
phonetics;
morphology;
syntax;
lexicology;
semantics.
18
CAUSE-EFFECT RELATIONSHIP - 4
The sounds which human beings use in their communication are a two-digit
number and can be coded in the same way in all languages.
19
CAUSE-EFFECT RELATIONSHIP - 5
The digital representation of spoken and written speech can be transformed in a
binary code using John Atanassov’s invention and can be transmitted in real time
(on line) to any place in the world.
20
THE INVENTION
Patent BG 63704 – 04.10.2002
METHOD FOR COMMUNICATION IN MOTHER TONGUE
21
Kiochiro Matsura Secretary- general of UNESCO
More than half of the 6 800 languages that are spoken today can disappear by
the end of the century. When a language dies part of the world dies. Language is more than an ordinary tool and a means of communication. It is a fundamental element of human nature. More than 20% of all languages do not have a written version. In Africa, where a third of the human languages are recorded, 80 % of the dialects do not have a scrip and exist only in a spoken form. Therefore, they are in danger of disappearing.
22
NATURE OF THE INVENTION
The communicative spoken and written speech is recorded a single time in the memory of the computer with the help of digits. A system of digital codes is
entered and it allows the identification of equivalent words ( including idioms), phrases, and the entire grammar of the
particular language.
23
Digital coding of phonetic speech
• The number of sounds in speech is a two-digit number. • The characteristics of the phonemes are coded with the
help of digits: serial number; vowel or consonant; short or long, stressed, etc. for the vowels; type of consonant – voiced opposed to voiceless.
• А – 01100 01 serial number, the third digit 1 is a vowel, the fourth digit 0 is a short vowel, the fifth digit 0– a vowel in a stressed syllable. In Czech, Slovak and other language there are long vowels (nemám) Then the fourth digit will be 1. Or : long vowel under stress Á - 01111
24
Digital coding of phonetic speech
• B – 02200 02 serial number, the third digit 2 is for a consonant, the fourth 0 is for voiceless (not voiced), the fifth digit– another characteristic of the consonants (long or double consonant in-innocent; in Arabic - arrabia);
• Ль (сколько) in Russian language – 1721017 serial number, the third digit 2 is for a consonant, the fourth digit 1 is for a voiced consonant; the fifth digit is for a double consonant – long or double consonant.
And so on for all sounds, the same for all languages.
25
Digital coding of phonetic speech
• The digital representation of the phonetic speech is a combination of the digital codes of the phonemes. The number of digits for identification of sound will be the same for all phonemes ( for example 5 digits); The digital representation of the separate syllables and word is a sum of the digital codes of the phonemes that comprise the respective syllables and words. This applies for diphthongs, ai, ou, ei, ie, etc.
26
Phoneme Recognition
Similar to the systems for fingerprint identification, where the name of a particular person corresponds to a print, with phonemes there will be a single and common for all languages digital code that will correspond to the graphical representation of each phoneme.
27
Phoneme Recognition
• The digital representation of words can be designed in a way that will allow taking into account reduction of vowels, devocalization of consonants and other phonetic phenomena typical of dialects. These variants will also correspond to the written variants of the word so that the software can recognize the spoken language.
28
Recognition of phonemesexample
Cаnadа – digital representation of the word:
0320001101142000110104100 01101
29
MORPHOLOGICAL CODES
The morphological codes will show consecutively what part of speech the word belongs to, its grammatical categories, and other characteristics.
1. The first digit – part of speech (10 parts of speech) will identify the part of speech. Zero (0) will be for nouns, One (1) will be for adjectives, five (5) will be for verbs, etc.
30
MORPHOLOGICAL CODES
2. The second digit – GENDER 0 – no gender, 1 – masculine gender, 2 – feminine gender, 3 – neuter gender;
3. The third digit – NUMBER 0 – no number, 1 – singular, 2 – plural;
4. Fourth and fifth digit – VERB TENSE (in some languages the number of verb tenses is a two-digit number);
31
MORPHOLOGICAL CODES
5. Sixth and seventh digit – CASE (in some languages there is a two-digit number of cases). Languages which do not have cases (such as Bulgarian and English) write 00.
6. Seventh and next digits – other grammatical categories and / or characteristics of words such as being countable /uncountable or animate / inanimate nouns.
32
MORPHOLOGICAL CODES
• The tenses with compound verb forms (consisting of two or more words) are coded as the phrases.
• The digital phonetic and morphological codes are recorded in an electronic dictionary against each word entered in it.
33
Coding of phrases, collocations, proverbs, etc.
The digital codes of the separate words are connected by an underscore. In this way the program will understand that this is one semantic unit which consists of many words:
Leje_ ako_ z_ konvy. (Slovak)
It_is_raining_ cats_ and_ dogs.
34
Coding according to field of knowledge
Our knowledge of the word is divided into various fields where words have different meanings (idioms). These fields of knowledge are coded in the same manner for all languages:
01 everyday speech
02 business
03 science
etc with a possibility for up to 99 fields of knowledge, etc. with a possibilities for 99 fields of knowledge.
35
Coding according to field of knowledge
• The digital codes of words start with the code for the field of knowledge.
• If a word has a different semantic meaning in the different fields it is coded separately for each field of knowledge. Semantic synonymy is achieved in this way.
36
Coding of synonyms
• Words which have synonyms can be grouped and arranged according to various aspects of their meaning, for example – everyday speech, slang, expressing of quality, etc. The digital codes of these groups are then added to the other codes.
37
Syntactic parser
• To perform a syntactic analysis of sentences we need to design software which includes all parts of speech, their grammatical categories and other characteristics. This programmed will apply separately the grammatical, syntactic, etc rules of each language. We should note that the components are the same for all languages but they interact differently according to the grammar rules of the particular languages.
38
Syntactic analysis
• Each sentence is analyzed syntactically before it is translated. To do so the programme finds the predicate centre of the sentence (the subject – verb relationship). The predicate is found first. It is a verb. The verbs starts with the digit 5. Then the program searches for the subject. It can be a pronoun or a noun which agrees with the verb in gender and number (the second and the third digit) according to the particular word order. The analysis continues until the software finds the syntactic function of each word in the particular sentence.
39
Coding of the syntactic function of words in sentences
Now the algorithms for machine translation apply the so called “Statistical method”. For the first time now there exists an opportunity for assigning a TEMPORARY code to each word in a sentence. This temporary code shows the function of the word – subject, predicate, attribute, adverbial, etc.
40
Coding of the syntactic function of words in sentences
• The temporary codes for the syntactic functions of words will allow the digital codes from one language to be transformed in the respective digital codes in the other language “in bulk”. Then in the second language a new syntactic analysis, a subject-verb agreement, and management is carried out before the sentence is translated grammatically correct according to the word order rules.
41
Language code
This code presents the origin of the language and its dialects:000100 Standard Bulgarian language000101 Rhodopi dialect (from the
Rhodope region)000102 Shopski dialect (from Sofia
region)000102 and so on till 199000200 Standard British English000201 American English000202 and so on to 299
42
Language code
000800 Standard French language
000801 French spoken in Quebec
008002 Second French dialect
008003 another French dialect, etc.• Over 6800 languages (the number can be
expanded up to 9999 languages) can be identified with the help of the first four digits of these codes. The rest of the codes can be used to identify up to 99 dialects in each language.
43
Sequence in the programming of written speech
Saving the sentence in the computer memory
Connection with the
database
Syntactic analysis
Morphological analysis
Temporary codes for the
roles of the words
Transfer of codes
Morphological analysis
Syntactic analysis and word
order
Writing the sentence in the
other language
44
Sequence in the programming of spoken speech
Recording of the sentence
Determining the phonemes of
each word
Determining of the word in a
written form
Morphological analysis
Syntactic analysis
Temporary cods of the words
Transferring to another
language
Morphological analysis
Syntactic analysis
Agreement of words in gender,
number, etc.
Sound representation of the
sentence with the voice of the person speaking the first
language
45
EXAMPLE
BGТова изобретение ще промени нашите представи за комуникация!
EN This invention will transform our ideas about communication!
GR Αυτή η εφεύρεση θα αλλάξει τις ιδέες μας επικοινωνίας!
Arab االتصال من أفكارنا تغيير سوف االختراع !هذاRU Это изобретение изменит наше представление
о коммуникации!D→ Diese Erfindung wird unsere Vorstellungen von der
Kommunikation verändern!
46
The future of this scientific discovery –free communication in mother tongue!
.
47
Possibilities of the technology
• After recording the human voice once ( as if taking fingerprints), the program will allow you to speak in Bulgarian ( or any other human language entered in the system) and people on the other end of the telephone line, Skype, microphone, etc. will hear you in their own language but with your voice. The missing phonemes from your language will be added by a synthesizer. The members of the European parliament will speak in their mother tongue but the rest in the hall will hear them in their own languages. The delay will be the same as in live interpreting – the time the program needs to perform a syntactic analysis of one sentence. The program can be uploaded on the computers of your mobile service provider.
• You can open a random Internet site written in a random language and read it in your language.
48
Vision for the development of the technology
• A pilot model for written translation between 4-5 languages;
• A pilot model for a speech translation;• Licensing by the universities around the world;• Every language and dialect will become part of system
for communication in a mother tongue after a one-time coding of grammar, words, and collocations;
• Communication between two distant languages ( such that do not have linguists who know the languages) will be carried out through another basic language such as English. If we do not do this now, communication will be through Chinese soon.
49
The others for us
• India (page 24 from the text, or page 19 from 132 of PDF file): http://www.saneinetwork.net/pdf/SANEI_VI/SANEI-VI-(EcommerceandEconomicDevelopment_FPEPR).pdf
50
The others for us
• France Press Agencyhttp://www.bulgaria-france.net/kmitev.html
• One invention against terrorism:
http://www.democrit.com/category.php?n=330&cat=27&br=12&wh_n=news17
• China radio international http://bg.chinabroadcast.cn/64/2005/09/29/[email protected]
51
The others for us
• BablePort USA• http://www.babelport.com/news/1106 • A Great Bulgarian Invention is waiting• http://www.novavizia.com/399.html • Vietnam • http://www.daichung.com/110/12_tinnho.shtm • Slovakia•
Will a brilliant Bulgarian invention change human communication?
• http://www.itnews.sk/buxus_dev/generate_page.php?page_id=37989
52
Belarusian Academy of Sciences
“...We are convinced that this project does not need only financial support. You know that UNESCO is deeply concerned about the future of the majority of the 6800 existing languages which are threatened by globalization processes. We think that this promising project will contribute to this very sensitive topic for the human civilization and in this way will obtain considerable political support”.
Sergei ABLAMEIKOProfessor, Ph.D. in Computer Sciences,
Associated member of Belarusian Academy of Sciences
53
THANK YOU FOR YOUR ATTENTION!
Dipl. Eng. Koycho MitevE-mail: [email protected]
Pictures from the Internet that are subject to copyright law are used in this presentation but since this document is not written for
commercial purposes the author thanks for the understanding. June 2009 Copyright ©