self-organization of the sound inventories: an explanation based on complex networks
Post on 17-Dec-2015
226 Views
Preview:
TRANSCRIPT
Self-Organization of the Sound Inventories: An Explanation
based on Complex Networks
Overview of the Talk
• Motivation
• Approach & Objective
• Principle of Occurrence in Consonant Inventories
• Principle of Co-Occurrence in Consonant Inventories
• Findings
• Conclusions and Future Work
Sabda Bramha: Sound is Eternity
sabda-brahma su-durbodham pranendriya-mano-mayam ananta-param gambhiramdurvigahyam samudra-vat
– Sound is eternal and as well very difficult to comprehend. It manifests within the life air, the senses, and the mind. It is unlimited and unfathomable, just like the ocean.
• Several living organisms can produce sound
– They emit sound signals to communicate
– These signals are mapped to certain symbols (meanings) in the brain
– E.g., mating calls, danger alarms
Signals and Symbols & § ۞ ☼ ♥
Human Communication
• Human beings also produce sound signals
• Unlike other organisms, they can concatenate these sounds to produce new messages – Language
• Language is one of the primary cause/effect of human intelligence
Human Speech Sounds
• Human speech sounds are called phonemes – the smallest unit of a language
• Phonemes are characterized by certain distinctive features like
Mermelstein’s Model
I. Place of articulation
II. Manner of articulation
III. Phonation
Types of Phonemes
Vowels Consonants Diphthongs
/ai/L
/a/
/i/
/u/
/p/
/t/
/k/
Choice of Phonemes
• How a language chooses a set of phonemes in order to build its sound inventory?
• Is the process arbitrary?
• Certainly Not!
• What are the forces affecting this choice?
Forces of Choice
/a/
Speaker Listener / Learner
/a/
Desires “ease of articulation” Desires “perceptual contrast” / “ease of learnability”
A Linguistic System – How does it look?
The forces shaping the choice are opposing – Hence there has to be a non-trivial solution
Vowels: A (Partially) Solved Mystery
• Languages choose vowels based on maximal perceptual contrast.
• For instance if a language has three vowels then in more than 95% of the cases they are /a/,/i/, and /u/.
Max
imall
y Dist
inct
Maximally Distinct
Maximally Distinct/u/
/a/
/i/
Consonants: A puzzle
• Research: From 1929 – Date
• No single satisfactory explanation of the organization of the consonant inventories
– The set of features that characterize consonants is much larger than that of vowels
– No single force is sufficient to explain this organization
– Rather a complex interplay of forces goes on in shaping these inventories
Ji g
sa
w
The Approach & Objective
• We adopt a Complex Network Approach to attack the problem of consonant inventories
• We try to figure out the principle of the distribution of the occurrence of consonants over languages
• We also attempt to figure out the co-occurrence patterns (if any) that are found across the consonant inventories
Principle of Occurrence
• PlaNet – The “Phoneme-Language Network”
– A bipartite network N=(VL,VC,E)
– VL : Nodes representing languages of the world
– VC : Nodes representing consonants
– E : Set of edges which run between VL and VC
• There is an edge e Є E between two nodes
vl Є VL and vc Є VC if the consonant c occurs
in the language l.
L1
L4
L2
L3
/m/
/ŋ/
/p/
/d/
/s/
/θ/
Conso
na
nts
Langu
ages
The Structure of PlaNet
Construction of PlaNet
• Data Source : UCLA Phonological Inventory Database (UPSID)
• Number of nodes in VL is 317
• Number of nodes in VC is 541
• Number of edges in E is 7022
Degree Distribution
• Degree of a node is defined as the number of edges connected to the node.
• Degree Distribution (DD) is the fraction of nodes, pk, having degree equal to k.
• The Cumulative Degree Distribution (CDD) is the fraction of nodes, Pk, having degree k.
Degree Distribution of PlaNet
0 50 100
150
0.02
0.04
0.06
0.08
Language inventory size (degree k)
pk
pk = beta(k) with α = 7.06, and β = 47.64
pk =Γ(54.7) k6.06(1-k)46.64
Γ(7.06) Γ(47.64)
kmin= 5, kmax= 173, kavg= 21
200
Pk
1000Degree of a consonant, k
Pk = k -0.71
Exponential Cut-off
1 10 100
0.001
0.01
0.1
1
DD of the language nodes follows a β-distribution
DD of the consonant nodes follows a power-law with an exponential cut-off
Distribution of Consonants over Languages follow a power-law
Preferential Attachment: The Key to Power Law
• Power law distributions observed in
– Social Networks
– Biological Networks
– Internet Graphs
– Citation Networks
• These distributions emerge due to preferential attachment
$$ $ $
$$ $ $
$ $ $ $$ $ $ $
RIC
H RIC
HE
R
Synthesis of PlaNet
Given: VL = {L1, L2, ..., L317} sorted in the ascending order of their degrees and 541 unlabeled nodes in VC .
Step 0: All nodes in VC have degree 0.
Step t+1:
Choose a language node Lj (in order) with cardinality kj (inventory size)
for c running from 1 to kj do
Pr(Ci) =di
α+ ε
∑xV* (dxα + ε)
Connect Lj preferentially with a consonant node Ci VC, to which it is already not connected, with a probability
where, di = degree of node Ci at step t and V* = subset of VC not connected to Lj at t and ε is the smoothing parameter.
L1 L3L2 L4
L1 L3L2 L4
The Preferential Mechanism of Synthesis
After step 3
After step 4
Simulation Result
The parameters α and ε are 1.44 and 0.5 respectively.
The results are averaged over 100 runs
PlaNetrand
PlaNetPlaNetsyn
1 10 100 1000
1
.1
.01
.001 Degree
(k)
Pk
Principle of Co-occurrence
• Consonants tend to co-occur in groups or communities
• These groups tend to be organized around a few distinctive features (based on: manner of articulation, place of articulation & phonation) – Principle of feature economy
If a language has in its inventory
then it will also tend to have
voiced voiceless
bilabial
dental
/b/ /p/
/d/ /t/
plosive
How to Capture these Co-occurrences?
• PhoNet – “Phoneme Phoneme Network”– A weighted network N=(VC,E)
– VC : Nodes representing consonants
– E : Set of edges which run between the nodes in VC
• There is an edge e Є E between two nodes vc1 ,vc2 Є VC if the consonant c1 and c2 co-occur in a language. The number of languages in which c1 and c2 co-occurs defines the edge-weight of e. The number of languages in which c1 occurs defines the node-weight of vc1.
/kw/
/k′/
/k/
/d′/42
14
38
13
283
17
50
39
Construction of PhoNet
• Data Source : UPSID
• Number of nodes in VC is 541
• Number of edges is 34012
PhoNet
Community Structures in PhoNet
• Radicchi et al. algorithm (for unweighted networks) – Counts number of triangles that an edge is a part of. Inter-community edges will have low count so remove them.
• Modification for a weighted network like PhoNet
– Look for triangles, where the weights on the edges are comparable.
– If they are comparable, then the group of consonants co-occur highly else it is not so.
– Measure strength S for each edge (u,v) in PhoNet where S is,
– Remove edges with S less than a threshold η
S =wuv
√Σi Є Vc-{u,v}(wui – wvi)2 if √Σi Є Vc-{u,v}(wui – wvi)2>0 else S = ∞
3
1
2
4100
110
101
10
5
646
52
45 3
1
2
411.11
10.94
7.14
0.06
5
63.77
5.17
7.5S
η>1
3
1
2 6
4
5
Community Formation
For different values of η we get different sets of communities
Consonant Societies!
η=1.25η=0.72
η=0.60
η=0.35
Evaluation of the Communities: Occurrence Ratio
• Hypothesis: The communities obtained from the algorithm should be found frequently in UPSID
• We define occurrence ratio to capture the “intensity” of occurrence,
– N is the number of consonants in C (ranked by the ascending order of frequency of occurrence) , M is the number of consonants of C that occur in a language L and Rtop is the rank of the highest ranking consonant in L that is also present in C
– If a high-frequency consonant is present in L it is not necessary that the low-frequency one should be present; but if a lower one is already present then it is expected that the higher one must be present
OL =M
N – (Rtop – 1)
Computing Occurrence Ratio: An Example
X
/kh/
/k/
/kw/
/kh/
X
/kw/
/kh/
/k/
/k/
/kh/
/kw/
C
L1
L2
L3
R =1
R =2
R =3
M=3, N=3, Rtop=1
OL=3/3=1
M=2, N=3, Rtop=2
OL=2/2=1
M=2, N=3, Rtop=1
OL=2/3=0.66
Average Occurrence Ratio
• For a given community it will have an occurrence ratio in each language L in UPSID
• We average this ratio over all L as,
where Loccur is the number of languages where at least one of the members of C has occurred
Oav =Loccur
ΣL Є UPSIDOL
Results of the Evaluation
Consonants show patterns of co-occurrence in 80% or more of the world’s languages
η >
0.3
Oav > 0.8
The Binding Force of the Communities: Feature Economy
• Feature Entropy: The idea is borrowed from information theory
• For a community C of size N, let there be pf consonants for which a particular feature f is present and qf other consonants for which f is absent – probability that a consonant chosen from C has f is pf /N and that it does have f is qf /N or (1- pf /N)
• Feature entropy can be therefore defined as
where F is the set of all features present in the consonants in C
• Essentially the number of bits needed to transmit the entire information about C through a channel.
ΣFЄf(-(pf /N)log(pf /N) – (qf /N)log(qf /N))FE =
Computing Feature Entropy
Lower FE -> C1 economizes on the number of features
Higher FE -> C2 does not economize on the number of features
If the Inventories had Evolved by Chance!
• Construction of PhoNetrand
– For each consonant c let the frequency of occurrence in UPSID be denoted by fc.
– Let there be 317 bins each corresponding to a language in UPSID.
– fc bins are then chosen uniformly at random and the consonant c is packed into these bins without repetition.
– Thus the consonant inventories of the 317 languages corresponding to the bins are generated.
– PhoNetrand can be constructed from these new consonant inventories similarly as PhoNet.
• Cluster PhoNetrand by the method proposed earlier
PhoNet
PhoNetrand
0 5 10 15 20
10
5
0
Avera
ge F
eatu
re
En
trop
y
Community Size
The curve shows the average feature entropy of the communities of a particular size versus the community size
Comparison between PhoNet and PhoNetrand
Our Findings
• The distribution of the occurrence of consonants over languages follow a power-law behavior;
• A preferential attachment-based model can reproduce this distribution of occurrence to a very close approximation (mean error ~0.01);
• The patterns of co-occurrence of the consonants, reflected through communities in PhoNet, are observed in 80% or more of the world's languages;
•Such patterns of co-occurrence would not have emerged if the consonant inventories had evolved just by chance;
The Epilogue
• How to explain preferential attachment?– Perhaps it is due to the linguistic heterogeneity involved in the
process of language change (at the microscopic level)– Consonants belonging to languages that are prevalent among the
speakers in one generation have a higher (and higher) chance of getting transmitted to the speakers of the subsequent generations
– The above heterogeneity manifests as preferential attachment in the mesoscopic level
• What is the cause of the origin of feature economy?– Perhaps it is the outcome of the interplay of the functional forces
such as the perceptual contrast and ease of learnability that is reflected as feature economy
Indo-European family of languages
Danke!
top related