computatianal and evolutionary aspects of language
DESCRIPTION
Computatianal and Evolutionary Aspects of LanguageTRANSCRIPT
Language Tree For Indo European Languages
Computatianal and EvolutionaryAspects of Language
By Melih SözdinlerFor CMPE 58B
Introduction Review paper including the mathematical
descriptions of language on three different levels
*Computatianal A1:Formal Language Theory A2:Learning Theory
*Evolutionary A3:Evolutionary dynamics
Mainly we will address some questions
1 /31
Introduction Questions that come in our minds
− What is language?− What is grammer?− What is difference between learning language
and learning other generative systems?− In what sense logical necessity occurs, is it
genetically determined?
Through the presentation gradually we will cover these questions.
2 /31
Before get into
4
Brain structure is needed to be understood.
Efforts to traslate written texts as “Google Tanslate” and “Yahoo Babel Fish” does and are working.
/31
Before get into
5
Can it be a machine that takes a sentence from any language and decides that
− “The sentence belongs to Language X1 and I can understand this”
− No machine could do this− Probably “Google Translate” and “Babelfish”
memorizing the phrases at all language they have an services in order to translate
/31
A1: Formal Language Theory
8
Language is − The mode of communication− Crucial part of our behaviour− Cultural object to define our social identity
Language has rules, roughly− In English “We went to the school”− In Turkish “We the school went to”
There are always specific rules to generate valid and meaningful lingustic structures*.
Natural Language which arises as the result of the innate facility possessed by the human intellect
“Lingustic structures” is the scientific study of “natural languages”**
A1: Formal Language Theory
/31
A1: Formal Language Theory
1
L is lanuage {S,A,B,F} are 'non-terminals'
containing alphabet consists two symbols(0 and 1) for simpilicity.
Grammer generates sentences in the form of 01m01n0
Finite language Infinite language
9 /31
A1: Formal Language Theory
10
Countably infinitely many grammers; any finite list of “rewrite rules”* can be encoded by an integer.
When the language is called “Computable” Computable languages couldbe represented with machinescalled “Turing Machines”
“Rewrite Rules” a certain string can be rewritten as another stirng
A1: Formal Language Theory
/31
A1: Formal Language Theory
11
Regular Languages – Finite State Automata
A1: Machines for Languages
Context-Free Languages – Push-down Automata
Context-sensitive Languages – Turing Machine
/31
A1: Formal Language TheoryA1: Chomsky Hierarchy
Finite State Grammers are subset of Context Frees Context Free Grammers are subset of Context Sensitives Context Sensitive Grammers are subset of Phrase
Structure Grammer that is Turing Complete 12/31
A1: Formal Language Theory
13
Are infinite: imagine a list contains all sentences in Turkish.
Finite state grammers are inadequate to cover NL Fundemental structures of NL are trees Tree is the derivation of sentences within the rule
system of a particula grammer. Trees may result with ambiguity that means more
than one tree is asociated with a given sentence.
A1: Natural Languages(NL)
Natural Language which arises as the result of the innate facility possessed by the human intellect /31
A1: Formal Language Theory
14
One can define a grammer by deciding which trees should be in its grammer.
This is the part of Chomsky hierarchy and “Learning Theory”.
Difference between learning and memorization We can somehow understand and produce
sentences that we may never heard and used.
A2: Learning Theory
/31
A1: Formal Language Theory “Environmental input” Child construct an internal representation of
underlaying grammer. ‘Poverty of Stimulus’: Environmental input does not
uniquely specify the grammatical rules(Chomksy1972).
‘The paradox of language acquisition’ is that children of the same speech community reliably grow up to speak the same lang.(Wexler1980)
A2: Paradax of Language Acquis.
15/31
A1: Formal Language Theory There are a restricted set of grammers The theory of this restricted set is “universal
grammer”(UG). Formally UG is not just a grammer, it is a theory of
collection of grammers UG is recently being more acceptable theory. 40 years ago, it was controversial
− The idea of innate and genetically determined UG− But in math approach of learning theory UG is logical necessity
A2: How Children Learn Correct Grammer
16Universal Grammar is made up of a set of rules that apply to most or all natural human languages
/31
A1: Formal Language Theory Speaker – Hearer pair
− Speaker uses grammer G to construct sentences of language L.
− Hearer receives sentences and should be able to use grammer G to construct other sentences L.
− In Math perspective, the hearer has an algorithm takes inputs a list of sentences and generates language as an output.
− “Text T” contains infinite list of sentences occuring at least once.
− Text TN is the first N sentences of T− Algorithm A: [A(TN) = L if N>M] provides correct lang
A2: Learnability
17/31
A1: Formal Language Theory We interested in what sets of languages can be learnable Key result of learning theory Gold's Theorem(Gold1967)
− Implies no algorithm can learn the set of languages that contains set of regular languages.
− Super-finite language: contains set of all finite languages and at least one infinite languages.
− If the learner infers that the target language is an infinite language, and actual target language is an finite language contained in infinite language. Indeed never converge onto correct language
Box1: Gold's Theorem
18All finite languages are regular. For instance L = { anbm| n >= 0 } /31
A1: Formal Language Theory Classical Learning Theory is formulated by Gold
− Assumptions, restrictive i)The learner has to identify target language exactly ii)The learner receives only positive examples
− Assumptions, unrestrictive iii)The learner has access to the arbitrary number of examples iv)The learner is not limited by comp complexity
Extensions− Statistical Learning Theory(Vapnik1971,1998)
Language and indicator functions (Is sentence S in L ? [T/F] ) Linguistic examples provided with distribution P both pos and neg
examples P also provides a metric for distances Learnability
Box1: Gold's Theorem
19/31
A1: Formal Language Theory Extensions
− Statistical Learning Theory(Vapnik1971,1998) Vapnik1971 conclude that a set of languages is learnable iff it has
finite VC dimension. VC dimension is combinatorial measure of the complexity of a set
of languages. If the set of languages is completely arbitrary therefore has infinite
VC dimension, learning is not possible. Then in VC framework assumptions i),ii) and iii) are removed.
− Vailant1975 added complexity issues which is assumption iv) in Gold's model. Consequently there are sets of languages that are learnable in principle but these are in NP time complexity.
− Angluin1995 tried to modified the problem into query based learning. Models provide to learn regular languages in polynomial time but
not context-free languages. Necessity of restriction because of complexity
Box1: Gold's Theorem
20/31
A1: Formal Language Theory S1,S2,S3 are three sentences, result with 8 langs
− Learner A knows 8 possible languages− Learner B knows 2 languagaes L1={S1,S2} & L2={S3}
If a sentence S1 is given to both A and B− A can not decide target lang− B can have some implications, he knows that S2 is the
part of the laguage and S3 is not. B extrapolate beyond the experience Indeed the ability to search underlying rules
requires a restricted search space.
A2: Learning Finite Languages
21/31
A1: Formal Language Theory The human brain is equipped with a learning
algorithm AH which enable us to learn certain languages.
There exist over 6000 languages in the world. AH can learn each of these but it is impossible to learn every computable languages.
The existance of 'Before Data' is equivalent to 'innate'
How to discover AH biologically is it our job as CS? - No
A2: The Necessity of innate Exepectations
22/31
A1: Formal Language Theory There are contradicted ideas
− AH is language specific or general purpose.− No matter how it is, mechanism or AH operates on
linguistic input and enables child to learn the rules of languages.
− This mechanism can learn the rules of the restricted set of languages; the theory behind is UG.
− Greenberg1978 and Comrie1981 dispute the individual linguistic universals but can not be denied.
− Neural Networks are important tools to model neural mechanisms but no neural network can learn unrestricted set of languages.
A2: The Necessity of innate Exepectations
23/31
A1: Formal Language Theory Language Evolution
− Understanding language evolution requires a theoretical framework explaining how darwinian dynamics lead to fundamental properties of human language such as arbitrary signs, lexicons, syntax and grammar
Basic approach− The basic approach is similar to evolutionary game
theory. There is a population of individuals. Each individual uses a particular language. Individuals talk to each other. Successful communication results in a pay-off that contributes to fitness.
A3: Language Evolution
24/31
A1: Formal Language Theory Cultural Evolution with constant universal grammer
− From biological perspective not only property of an individual, also extended phenotype* of a population.
− Eq 1 is calculated by assuming UG is constant from generation to generation Fitness equation for each Li is calculated using communicative
payoffs. Learning matrix Q and average fitness of population implies a
measure for “linguistic coherence” See Box2 for details.
− Eq 1 actually describes selection of languages for increased communicative function and increased learnability.
A3: Language Evolution
25*Phenotype:kalıtılmla ilgili olan dış görünüş /31
A1: Formal Language Theory In figure 4 you will see a plot for relations between
lingistic coherence and # of candidate grammers in UG
Remember average fitness of population implies a measure for “linguistic coherence”
Fitness depends on the cumulative communication pay-off.
A3: Language Evolution
26*Phenotype:kalıtılmla ilgili olan dış görünüş /31
A3:Evolution of Universal Grammer Evolution of universal grammar
− Evolution of UG requires variaton in UG UG is in fact neither a grammer nor universal.
− In Eq 2 n grammers for each UG vary from U1 to UM
− UI mutates genetically another UJ with prob WIJ− The equation describes mutation and selection among
M different universal grammers. − In evolution historical evolution of human with
successor of UGs led to UG of the currently living organisms.
− UG emerged at some point that allowed languages of unlimited expressibility. 28/31
Outlook Languages changes since transmission between
generations is not perfect.− Grammaticalization and Creolization is possible
Many language changes are selectively neutral Some questions arises
− What is the interplay between the biological evolution of UG and the cultural evolution of language?
− What is the mechanism for adaptation among the various languages generated by a given UG?
− What are the restrictions imposed by UG?− Can we identify genes that are crucial for linguistic or other
cognitive functions?What can we say about the evolution of those genes? 29/31
Outlook All of these questions are needed to be co-
evaluated by many disciplines including linguistics, cognitive science, psychology, genetics, animal behaviour, evolutionary biology, neurobiology and computer science.
30/31