the large-scale structure of semantic networks

The Large-Scale Structure of Semantic Networks

A. Tuba BaykaraCognitive Science

2002700187

2

Overview

1) Introduction2) Analysis of 3 semantic networks and their statistical

properties- Associative Network- WordNet- Roget’s Thesaurus

3) The Growing Network Model proposed by the authors- Undirected Growing Network Model- Directed Growing Network Model

4) Psychological Implications of the findings5) General Discussion and Conclusions

3

1) Introduction

Semantic Network: A network where concepts are represented as hierarchies of inter-connected nodes, which are linked to characteristic attributes.

Important to understand their structure because they reflect the organization of meaning and language.

Statistical similarities important because of their implications on language evolution and/or acquisition.

Would a similar model have the same statistical properties? Growing Network Model

4

1) Introduction1) IntroductionPredictions related to the model

1- It would have the same characteristics:

* Degree distribution would follow a power-law some concepts would have much higher connections

* Addition of new concepts would not change such structure Scale-free (vs. small-world!!)

2- Previously added (early acquired) concepts would have higher connectivity than later added (acquired) concepts.

5

1) Introduction1) IntroductionTerminology

Graph, network– Node, edge (undirected link), arc (directed link), degree– Avg. shortest path (L), diameter (D), clustering coefficient

(C), degree distribution ()

Small-world network, random graph

6

2) Analysis of 3 Semantic Networksa. Associative Network

“The University of South Florida Word Association, Rhyme and Word Fragment Norms”

>6000 thousand participants; 750,000 responses to 5,019 cues (stimulus words)

great majority of these words are nouns (76%), but adjectives (13%) and verbs (7%), and other parts of speech are also represented. In addition, 16% are identified as homographs

7

2) Analysis of 3 Semantic Networks2) Analysis of 3 Semantic Networksa. Associative Network

Examples:

BOOK _______

BOOK READ SUPPER _______

SUPPER LUNCH

8


DINNER SUPPER EAT LUNCH FOOD MEAL

DINNER - 0.54 0.11 0.10 0.09 0.09

SUPPER 0.55 - 0.02 0.03 0.17 0.01

EAT - 0.41 0.02

LUNCH 0.27 0.02 0.08 - 0.20 0.06

FOOD 0.41 0.01 - 0.02

MEAL 0.21 0.06 0.06 0.06 0.49 -

Note: for simplicity, the networks were constructed with all arcs and edges unlabeled and equally-weighted.

Forward & backward strength imply directions.

(when SUPPER was normed, it produced LUNCH as a target with a forward strength of .03)

9


I) Undirected network Word nodes were joined by an edge if associatively

related, regardless of associative direction

The shortest path from VOLCANO to ACHE is highlighted.

10


II) Directed network Words x & y were joined by an arc from x to y if cue x

evoked y as an associative response

all shortest directed paths from VOLCANO to ACHE are shown.

11

2) Analysis of 3 Semantic Networksb. Roget’s Thesaurus

1911 edition with 29,000 words from 1,000 categories A connection is made only between a word and a

semantic category, if that word is within that category. bipartite graph

12

2) Analysis of 3 Semantic Networks2) Analysis of 3 Semantic Networksb. Roget’s Thesaurus

numeration representation painting

calculator numbering accounting computer imitation map design perspective chalk monochrome

accountingdesign chalk

map

calculator perspective

computer

imitation

numberingmonochrome

Bipartite graph

Unipartite graph

13

2) Analysis of 3 Semantic Networks2) Analysis of 3 Semantic Networksc. WordNet

Developed by George Miller at the CogSci Lab in Princeton Uni.: http://wordnet.princeton.edu

Based on the relation between synsets; contained more than 120k word forms and 99k meanings

ex: The noun "computer" has 2 senses in WordNet.1. computer, computing machine, computing device, data processor, electronic computer, information processing system -- (a machine for performing calculations automatically)2. calculator, reckoner, figurer, estimator, computer -- (an expert at calculation (or at operating calculating machines))

14


Links are between word forms and their meanings according to the relationships between word forms such as:

– SYNONYMY– POLYSEMY– ANTONYMY – HYPERNYMY (Computer is a kind of machine/device/object.)– HYPONYMY (Digital computer/Turing machine… is a kind of computer)– HOLONYMY (Computer is a part of a platform)– MERONYMY (CPU/chip/keyboard… is a part of a computer)

Links can be established in any desired way, so WordNet treated as an undirected graph.

15

2) Analysis of 3 Semantic Networks2) Analysis of 3 Semantic NetworksStatistical Properties

I) How sparse are the 3 networks? <k>: avg. # of connections In all 3, a node is connected to only a small % of other nodes.

II) How connected are the networks? Undirected A/N: completely connected Directed A/N: largest connected component has 96% of all words WordNet & Thesaurus: 99%

All further analyses with these components!!!

16


17


III) Short Path-length (L) and Diameter (D) In WordNet & Thesaurus, L & D based on a sample of 10,000

words. In A/N, all words considered. L & D in random graphs with equivalent size; expected

IV) Local Clustering (C) To measure its C, directed A/N regarded as undirected To calculate C of Thesaurus, bipartite graph converted into

unipartite graph C of all 4 networks much higher than in random graphs

18


V) Power-Law Degree Distribution ()

• All distributions are plotted in log-log coordinates with the line showing best fitting power law distribution.• in of Directed A/N lower than the rest

These semantic networksare scale-free!

19

2) Analysis of 3 Semantic Networks2) Analysis of 3 Semantic NetworksStatistical Properties / Summary

Sparsity & High-ConnectivityOn avg. words are related to only a few other words

Local ClusteringConnections between words are coherent and transitive:

if xy and yz; then xz

Short Path Length and DiameterLanguage is expressive and flexible (thru’ polysemy & homonymy..)

Power-Law Degree DistributionLanguage hosts hubs as well as many words connected to few others

20

3) The Growing Network Model

Inspired by Barabási & Albert (1999) Incorporates both growth and preferential attachment Aim: to see whether the same mechanisms are at work

or not in real-life semantic networks and artificial ones Might be applied to lexical development in children

+ growth of semantic structures across languages,

or even language evolution

21

3) The Growing Network Model

Assumptions: how children learn concepts is thru’ semantic

differentiation: a new concept differentiates an already existing one, acquires a similar meaning, but also different, with a different pattern of connectivity.

more complex concepts get more differentiated more frequent concepts get more involved in

differentiation

22

3) The Growing Network Model3) The Growing Network ModelStructure

Nodes are words, and connections are semantic associations/relations

Nodes are different in their utility frequency of use Over time new nodes are added and attached to

existing nodes probabilistically according to:– Locality principle: New links are added only into a local

neighborhood a set of nodes with a common neighbor– Size principle: New connections will be to neighborhoods

with already large # of connections– Utility principle: New connections within a neighborhood will

be onto nodes with high utility (rich-get-richer phenomenon)

23

3) The Growing Network Model3) The Growing Network Modela. Undirected GN Model

Aim: To grow a network with nn nodes # of nodes at time tt is n(t)n(t) Start with a fully connected network of MM nodes (MM<<nn) At each tt, add a node ii with MM links (chosen for a desired

avg. density of connections) into a local neighborhood HHii the set of neighbors of i i including i i itself.

Choose a neighborhood according to the size principle:

kkii(t)(t): degree of node ii at time tt

Ranges over all current n(t)n(t) in the network

24


Ranges over all nodes in Hi

Connect to a node j j in the neighborhood of node i i according to the utility principle:

If all utilities are equal, make a connection randomly:

Stop when n n nodes are reached.

UUjj = log(ffjj+1); ffjj taken from Kučera &Francis(1967) frequency count

25


The growth process and a small resulting network with n=150, M=2:

26

3) The Growing Network Model3) The Growing Network Modelb. Directed GN Model

Very similar to the Undirected GN Model: insert nodes with MM arcs instead of links

Same equations to apply locality, size and utility principles, since:

ki = kiin + ki

out

Difference: Direction Principle: majority (!) of arcs are pointed from new nodes to existing nodes the p that an arc points away from the new node is , where >0.5 is assumed; so most arcs will point towards existing nodes.

27

3) The Growing Network Model3) The Growing Network ModelModel Results

Due to computational constraints, the GN model was compared only with A/N model.

n=5018; M=11 and M=12 in the undirected and directed GN models respectively.

The only free parameter in Directed GN model, , was set to 0.95

The networks produced by the model are similar to A/N in terms of their L, D, C. Same low in as in Directed A/N.

28

3) The Growing Network Model3) The Growing Network ModelModel Results

Also checked if the same results would be produced when the Directed GN Model was converted into an undirected one. why!?

Convert all arcs into links, with MM=11 and =0.95 Results similar to Undirected GN model.

Degree distribution follows a power-law

29

3) The Growing Network Model3) The Growing Network ModelArgument

L, C and from the artificial networks were expected to compare to real-life networks:

– incorporation of growth – incorporation of preferential attachment (locality, size & utility

principles) Do models without growth not produce such power-laws? Analyze the co-occurrence of words within a large corpus

Latent Semantic Analysis (LSA): meaning of words can be represented by vectors in a high dimensional space

Landauer & Dumais (1997) have already shown that local neighborhoods in semantic space captures semantic relations between words.

30

3) The Growing Network Model3) The Growing Network ModelLSA Results

Higher L, D and C than in real-life semantic networks Very different degree-distribution. The distributions do not

follow a power-law. Difficult to interpret the slope of the best fitting line.

31


Analysis of the TASA corpus (>10mio words) using LSA vector representation:

All words from A/N in TASA Most freq. words in TASA

All words from LSA (>92k)represented as vectors

32


Non-existence of power-law degree distribution implies LSA does not produce hubs.

In contrast, a growing model provides a principled explanation for the origin of power-law: Words with high connectivity acquire even more connections over time.

33

4) Psychological Implications

Number of connections a node has is related to the time at which the node is introduced into the network.

Predictions: – Concepts that are learned early in life will have more

connections than concepts learned later.– Concepts with high utility (frequency) will receive more links

than concepts with lower utility.

34

4) Psychological Implications4) Psychological ImplicationsAnalysis of AoA-related data

To test the prediction, two data sets were analyzed:

I) Age of Acquisition Ratings (Gilhooly & Logie, 1980) AoA effect: Early acquired words are retrieved from

memory more rapidly than late acquired words An experiment with 1,944 words Adults were required to estimate the age at which they

thought they first learned a word on a rating scale (100-700, 700 rated to be very late-learned concept)

II) Picture naming norms (Morrison, Chappell & Ellis, 1997) Estimation of the age at which 75% of children could

successfully name the object depicted by a picture

35

4) Psychological Implications4) Psychological ImplicationsAnalysis of AoA-related data

Predictions

are

confirmed!

Standarderrorbarsaroundthe means

36

4) Psychological Implications4) Psychological ImplicationsDiscussion

Important consequences on psychological research on AoA and word frequency

– Weakens: AoA affects mainly the speech output system AoA & word frequency display their effect on behavioral tasks

independently

– Confirms: early acquired words show short naming-latencies and lexical-

decision-latencies AoA affects semantic tasks AoA is mere cumulative frequency

37

4) Psychological Implications4) Psychological ImplicationsCorrelational Analysis of Findings

Early acquired words have more semantic connections (more central in an underlying semantic network) early acquired words have higher degree centrality

Centrality can also be measured by computing the eigenvector of the adjacency matrix with the largest eigenvalue.

Analysis of how degree centrality, word frequency and AoA from previous rating & naming studies correlate with 2 databases:

– Naming-latency db of 796 words– Lexical-decision-latency db of 2,905 words

38

4) Psychological Implications4) Psychological ImplicationsCorrelational Analysis of Findings

• Centrality negatively correlates with latencies• AoA correlates positively withlatencies• Word frequency correlates negatively with latencies• When effects of word freq. andAoA partialled out, centrality-latency correlation remainsignificant there must be othervariables

39

5) General Discussion and and ConclusionsConclusions

Weakness of correlational analysis: direction of causation is unknown:

– Because acquired early, a word will have more connections

vs.– Because of having more connections, a word will be acquired

early A connectionist model can produce similar results: early

acquired words are learnt better.

40

5) General Discussion and General Discussion and Conclusions

Power-law degree distributions in semantic networks can be understood by semantic growth processes hubs

Non-growing semantic representations as LSA do not produce such a distribution per se.

Early acquired concepts have richer connections confirmed by AoA norms.

41

References

Barabási, A.L., & R. Albert (1999). Emergence of scaling in random network models. Science, 286, 509-512.

Gilhooly, K.J., & R.H.Logie (1980). Age of Acquisition, imagery, concreteness, familiarity and ambiguity measures for 1944 words. Behavior Research Methods and Instrumentation, 12, 395-427.

Kučera, H., & W.N.Francis (1967). Computational analysis of present-day American English. Providence, RI: Brown University Press.

Landauer, T.K., & S.T.Dumais (1997). A solution to Plato’s problem: The Latent Semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104, 211-240.

Morrison, C.M., T.D.Chappell and A.W.Ellis (1997). Age of Acquisition norms for a large set of object names and their relation to adult estimates and other variables. Quarterly Journal of Experimental Psychology, 50A, 528-559.

Thanks for your attention!

Questions / comments

are appreciated.

43


Number of words, synsets, and senses

POS Unique Synsets Total Word-Strings Sense Pairs

Noun 114,648 79,689 141,690

Verb 11,306 13,508 24,632

Adjective 21,436 18,563 31,015

Adverb 4,669 3,664 5,808

Totals 152,059 115,424 203,145

44


With N nodes and <k> avg.degree If <k> = pN < , the graph is composed of isolated trees If <k> > 1, a giant cluster appears If <k> ln(N), the graph is totally connected

45

Roget’s Thesaurus

WORDS EXPRESSING ABSTRACT RELATIONS

WORDS RELATING TO SPACE

WORDS RELATING TO MATTER

WORDS RELATING TO THE INTELLECTUAL FACULTIES

WORDS RELATING TO THE VOLUNTARY POWERS

WORDS RELATING TO THE SENTIMENT AND MORAL POWERS

the large-scale structure of semantic networks

Documents