visualising linguistic evolution in academic...

32
Visualising Linguistic Evolution in Academic Discourse Verena Lyding, Ekaterina Lapshinova, Stefania Degaetano, Henrik Dittmann, Chris Culy Joint Workshop of LINGVIS & UNCLH EACL-2012 Avignon, France V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 1 / 32

Upload: others

Post on 25-Sep-2019

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Visualising Linguistic Evolution in Academic Discourseling.uni-konstanz.de/pages/home/hautli/lingvis/lingvis-slides-lydingetal.pdf · Visualising Linguistic Evolution in Academic

Visualising Linguistic Evolutionin Academic Discourse

Verena Lyding, Ekaterina Lapshinova, Stefania Degaetano,Henrik Dittmann, Chris Culy

Joint Workshop of LINGVIS & UNCLHEACL-2012

Avignon, France

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 1 / 32

Page 2: Visualising Linguistic Evolution in Academic Discourseling.uni-konstanz.de/pages/home/hautli/lingvis/lingvis-slides-lydingetal.pdf · Visualising Linguistic Evolution in Academic

Overview

1 Introduction

2 Data to AnalyseLexico-grammatical FeaturesResources & Feature Extraction

3 Structured Parallel CoordinatesSPC VisualisationCustomisation and Interactive FeaturesVisual Analysis of Registers with SPC

4 Interpreting Visualisation ResultsCase Study I - changes in variable TENORCase Study II - changes in variable FIELD

5 Conclusion and Future Work

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 2 / 32

Page 3: Visualising Linguistic Evolution in Academic Discourseling.uni-konstanz.de/pages/home/hautli/lingvis/lingvis-slides-lydingetal.pdf · Visualising Linguistic Evolution in Academic

REGICO: Registers in Contact

Ekaterina Lapshinova

Stefania Degaetano

Elke Teich

FR 4.6 Applied Linguistics,Interpreting and Translation Studies

Saarland UniversitySaarbrücken

LinfoVis

Verena Lyding

Henrik Dittmann

Chris Culy

Institute for SpecialisedCommunication and

MultilingualismEURAC, Bozen-Bolzano

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 3 / 32

Page 4: Visualising Linguistic Evolution in Academic Discourseling.uni-konstanz.de/pages/home/hautli/lingvis/lingvis-slides-lydingetal.pdf · Visualising Linguistic Evolution in Academic

Introduction

Aimscreate procedures to visualise diachronic language changes inacademic discourse with the help of SPC, cf. (Culy et al., 2011)⇒to facilitate analysis and interpretation of complex data

Motivationstudy diachronic changes with focus on contact registerschanges are reflected by linguistic featureswe determine and describe tendencies of features, which mightbecome rarer, more frequent or cluster in new ways⇒the amount and complexity of the interrelated data

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 4 / 32

Page 5: Visualising Linguistic Evolution in Academic Discourseling.uni-konstanz.de/pages/home/hautli/lingvis/lingvis-slides-lydingetal.pdf · Visualising Linguistic Evolution in Academic

Data to Analyse Lexico-grammatical Features

Register Analysis

Registers are patterns of language according to use in contextcf. (Halliday&Hasan, 1989)

Linguistic variation according to contexts of use, with variablesfieldtenormode

cf. Systemic Functional Linguistics (SFL) and register theory, e.g.,(Quirk, 1985), (Halliday&Hasan, 1989) and (Biber, 1995),

Particular settings of these variables are associated with certainlexico-grammatical features⇒ co-occurrences indicate distinctive registers(e.g., the language of linguistics in academic discourse).

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 5 / 32

Page 6: Visualising Linguistic Evolution in Academic Discourseling.uni-konstanz.de/pages/home/hautli/lingvis/lingvis-slides-lydingetal.pdf · Visualising Linguistic Evolution in Academic

Data to Analyse Lexico-grammatical Features

Recent Language Change

changes incontexts of use(variables) andlanguage use(features)

features becomerarer ormore frequent,and clusterin novel ways

⇒existing registersbecome obsolete,new ones evolve

cf. (Mair, 2006): changes in preferences of lexico-grammaticalselection in English in the 1960s vs. the 1990s.

Our focus: new registers that evolve in contact of disciplines (e.g.the language of bioinformatics, a contact register to biology andcomputer science)

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 6 / 32

Page 7: Visualising Linguistic Evolution in Academic Discourseling.uni-konstanz.de/pages/home/hautli/lingvis/lingvis-slides-lydingetal.pdf · Visualising Linguistic Evolution in Academic

Data to Analyse Lexico-grammatical Features

Case Study I - changes in variable tenor

TENOR: modalitymodal verbs grouped according to (Biber, 1999):obligation, permission and volition

categories of meaning (feature) realisationobligation/necessity can, could, may, etc.permission/possibility/ability must, should, etc.volition/prediction will, would, shall, etc.

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 7 / 32

Page 8: Visualising Linguistic Evolution in Academic Discourseling.uni-konstanz.de/pages/home/hautli/lingvis/lingvis-slides-lydingetal.pdf · Visualising Linguistic Evolution in Academic

Data to Analyse Lexico-grammatical Features

Case Study II - changes in variable field

FIELD: verb valency patternsCompeting grammatical variants, e.g. valency patterns show thetrends in the development of grammatical features, cf. (Mair, 2006)

valency patterns (feature) exampleVERB+inf help do sth.VERB+obj+inf help sb. do sth.VERB+to-inf help to do sth.

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 8 / 32

Page 9: Visualising Linguistic Evolution in Academic Discourseling.uni-konstanz.de/pages/home/hautli/lingvis/lingvis-slides-lydingetal.pdf · Visualising Linguistic Evolution in Academic

Data to Analyse Resources & Feature Extraction

SciTex

DaSciTex:from the early 2000s

- approx. 17 million words

SaSciTex:from the 1970s/early 1980s

- approx. 17 million words

COMPUTERSCIENCE

(A)

LINGUISTICS(C1)

COMPUTA

TIONAL

LINGUIS

TICS

(B1)

BIOLOGY(C2)

BIO-

INFORMATICS

(B2)

ELECTRICALENGINEERING

(C4)

MICRO-

ELECTRONICS

(B4)

MECHANICALENGINEERING

(C3)

DIGIT

AL

CONSTRUCTION

(B3)

cf. (Degaetano et al., 2012) and (Teich&Fankhauser, 2010)

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 9 / 32

Page 10: Visualising Linguistic Evolution in Academic Discourseling.uni-konstanz.de/pages/home/hautli/lingvis/lingvis-slides-lydingetal.pdf · Visualising Linguistic Evolution in Academic

Data to Analyse Resources & Feature Extraction

Corpus Annotations

automatic token, lemma, part-of-speech, chunktext register, text year, division, etc. (metadata)

semi-automatic cohesive devices, evaluative patterns

manual transitivity

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 10 / 32

Page 11: Visualising Linguistic Evolution in Academic Discourseling.uni-konstanz.de/pages/home/hautli/lingvis/lingvis-slides-lydingetal.pdf · Visualising Linguistic Evolution in Academic

Data to Analyse Resources & Feature Extraction

Extractions From Corpora

with the Corpus Query Processor (CQP), cf. (Evert 2005)

Positional Attributes: wordposlemma

Structural Attributes: stexttext_titletext_authortext_yeartext_adcohesioncohesion_devicemodalmodal_meaningevaluationevaluation_pattern

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 11 / 32

Page 12: Visualising Linguistic Evolution in Academic Discourseling.uni-konstanz.de/pages/home/hautli/lingvis/lingvis-slides-lydingetal.pdf · Visualising Linguistic Evolution in Academic

Data to Analyse Resources & Feature Extraction

Examples of Extraction

Case I Extraction: Modal Menaings

Query building blocks comments sentences extracted from SciTex

context Each edge[_._modal_meaning=”obligation"] category of obligation can[pos="V.*"] verb transmit

context a single packet in each time stepcontext S

[_._modal_meaning=”permission"] category of permission must[pos="V.*"] verb remove

context at least bj jobscontext We

[_._modal_meaning=”volition"] category of volition shall[pos="V.*"] verb use

context s adversary trees

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 12 / 32

Page 13: Visualising Linguistic Evolution in Academic Discourseling.uni-konstanz.de/pages/home/hautli/lingvis/lingvis-slides-lydingetal.pdf · Visualising Linguistic Evolution in Academic

Data to Analyse Resources & Feature Extraction

Examples of Extraction

Case II Extraction: Valency Patterns

Query building blocks comments sentences extracted from SciTex

context It also The poweravailable withthe system

Lemma 1

[pos=”V.*"&lemma=”help"] verb helphelps helps helps

[pos=”TO"]? optional toto

( object start[pos=”DT|PP|PDT"]? one or none determiner

the[pos=”RB.*|JJ.*|VVN|N.*"]{0,3} up to 3 modifiers[pos=”POS"]? one or none possessive[pos=”N.*|PP"]? noun or pronoun

programmer) object end[pos=”V(V|B|H)"] infinitive

organise refrain setcontext routine review

of recordingsfrom resistingchanges

the inductivebasis for k

⇓ ⇓ ⇓valency patterns

VERB+inf VERB+obj+inf VERB+to-inf

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 13 / 32

Page 14: Visualising Linguistic Evolution in Academic Discourseling.uni-konstanz.de/pages/home/hautli/lingvis/lingvis-slides-lydingetal.pdf · Visualising Linguistic Evolution in Academic

Data to Analyse Resources & Feature Extraction

Extraction Output

Preparation for Analysisextracted material is sorted according to registersdata is transformed into JSON format for input to SPC

Analysis Aimsregister analysis of A-B-C triples:⇒whether B disciplines are more similar to A or C or distinct frombothdiachronic analysis:⇒two time periods in SciTex (70/80s vs. 2000s)a more fine-grained diachronic analysis: publication year

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 14 / 32

Page 15: Visualising Linguistic Evolution in Academic Discourseling.uni-konstanz.de/pages/home/hautli/lingvis/lingvis-slides-lydingetal.pdf · Visualising Linguistic Evolution in Academic

Structured Parallel Coordinates SPC Visualisation

Structured Parallel Coordinates (SPC)

SPC (Culy et al. 2011) are a specialisation of the Parallel Coordinatesvisualisation (d’Ocagne 1885; Inselberg 1985, 2009)

The Parallel Coordinates visualisation provides:two-dimensional representation of multidimensional datadata dimensions on vertical axes, lined up horizontallyrelated data points are connected by colored lines between axes

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 15 / 32

Page 16: Visualising Linguistic Evolution in Academic Discourseling.uni-konstanz.de/pages/home/hautli/lingvis/lingvis-slides-lydingetal.pdf · Visualising Linguistic Evolution in Academic

Structured Parallel Coordinates SPC Visualisation

Parallel CoordinatesExample visualising car features

Taken from protovis page:http://mbostock.github.com/protovis/ex/cars.html

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 16 / 32

Page 17: Visualising Linguistic Evolution in Academic Discourseling.uni-konstanz.de/pages/home/hautli/lingvis/lingvis-slides-lydingetal.pdf · Visualising Linguistic Evolution in Academic

Structured Parallel Coordinates SPC Visualisation

SPC for language data

Adaptation of Parallel Coordinates to accomodate language data,e.g. as derived from corpora

customised for representing ordered characteristics within andacross dimensions

- e.g. in the n-grams with frequencies application of SPC, orderedaxes represent the linear ordering of words in text

- e.g. visual separation of ordered and unordered axesrefined modes of interaction

- e.g. non-contiguous selection of values

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 17 / 32

Page 18: Visualising Linguistic Evolution in Academic Discourseling.uni-konstanz.de/pages/home/hautli/lingvis/lingvis-slides-lydingetal.pdf · Visualising Linguistic Evolution in Academic

Structured Parallel Coordinates SPC Visualisation

N-grams with Frequencies applicationPronouns used with happy and sad

⇒ It is sad > It’s sad > One was sad > It was sad > We were sadV.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 18 / 32

Page 19: Visualising Linguistic Evolution in Academic Discourseling.uni-konstanz.de/pages/home/hautli/lingvis/lingvis-slides-lydingetal.pdf · Visualising Linguistic Evolution in Academic

Structured Parallel Coordinates Customisation and Interactive Features

SPC for analysing language change

In SPC data dimensions are placed on different axessubcorpus characteristics,lexico-grammatical features, andtheir frequencies.

Numerical axes are ordered according to time/register of thesubcorpus

i.e. corpus from the 1970/80s→ corpus 2000si.e. computer science→ mixed-discipline (e.g. bioinformatics)→specialised discipline (e.g. biology)

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 19 / 32

Page 20: Visualising Linguistic Evolution in Academic Discourseling.uni-konstanz.de/pages/home/hautli/lingvis/lingvis-slides-lydingetal.pdf · Visualising Linguistic Evolution in Academic

Structured Parallel Coordinates Customisation and Interactive Features

Subcorpus Comparisons - adjustments

For analysing language change:changes in linguistic features over timechanges in linguistic features across registers

SPC subcorpus comparison applicationbased on n-grams with frequencies applicationordered numerical axes follow (unordered) categorical axesdiscrete line coloring for the distinction of categorical variablesswitching between comparable and individual numerical scales

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 20 / 32

Page 21: Visualising Linguistic Evolution in Academic Discourseling.uni-konstanz.de/pages/home/hautli/lingvis/lingvis-slides-lydingetal.pdf · Visualising Linguistic Evolution in Academic

Structured Parallel Coordinates Customisation and Interactive Features

SPC subcorpus comparison applicationVisualising linguistic features by register and time

Visualisation of HELP plus complements by register

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 21 / 32

Page 22: Visualising Linguistic Evolution in Academic Discourseling.uni-konstanz.de/pages/home/hautli/lingvis/lingvis-slides-lydingetal.pdf · Visualising Linguistic Evolution in Academic

Structured Parallel Coordinates Customisation and Interactive Features

Visual analysis of registers

The SPC visualisation allows for:display of multidimensional datadynamic interaction with the data

- comparable vs. individual numerical scales- discrete vs. scaled coloring of lines→ OVERVIEW

- selection of data points for dynamic filtering- line coloring according to axis in focus→ FOCUS

- highlighting of axes on mouseover- written summary of the record→ DETAILS

⇒ support for the detection of patterns, tendencies and outliers

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 22 / 32

Page 23: Visualising Linguistic Evolution in Academic Discourseling.uni-konstanz.de/pages/home/hautli/lingvis/lingvis-slides-lydingetal.pdf · Visualising Linguistic Evolution in Academic

Interpreting Visualisation Results Case Study I - changes in variable TENOR

Analysing modal meanings

Investigation of changes in usage: obligation, permission and volition

⇒ DEMO:

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 23 / 32

Page 24: Visualising Linguistic Evolution in Academic Discourseling.uni-konstanz.de/pages/home/hautli/lingvis/lingvis-slides-lydingetal.pdf · Visualising Linguistic Evolution in Academic

Interpreting Visualisation Results Case Study I - changes in variable TENOR

Modal Meanings by Time

selection of permission, focus on registers:remarkably less increase for some data sets→ Electrical Engineering domain (C4)selection of single disciplines, focus on registers :

- (A-B1-C1, Linguistics):B is closer to C than to A for all modal meanings in the 1970/80s

- (A-B2-C2, Biology):no remarkable differences in tendency for volition; strongerdecrease in C than in A and B for obligation; for permissionincrease in B lies between increase for C and A

Link: www.eurac.edu/linfovis> LInfoVis programs and resources > Structured Parallel Coordinates > (sub)corpus comparisons

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 24 / 32

Page 25: Visualising Linguistic Evolution in Academic Discourseling.uni-konstanz.de/pages/home/hautli/lingvis/lingvis-slides-lydingetal.pdf · Visualising Linguistic Evolution in Academic

Interpreting Visualisation Results Case Study I - changes in variable TENOR

Modal Meanings by Register

focus on time: modal meanings behave similarily over time- detailed analysis with selection of single modal meanings:

obligation: strongest decrease for B3 to C3;strongest increase for B1-C1 and B2-C2permission: strongest decrease for B1 to C1;strongest increase for B3-C3volition: strongest decrease for B2 to C2

- detailed analysis with selection of Biology:e.g. focus on obligation: contrary tendencies for B and C over time

focus on registers, normalised values:C values remained stable, B values decreased over time

Link: www.eurac.edu/linfovis> LInfoVis programs and resources > Structured Parallel Coordinates > (sub)corpus comparisons

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 25 / 32

Page 26: Visualising Linguistic Evolution in Academic Discourseling.uni-konstanz.de/pages/home/hautli/lingvis/lingvis-slides-lydingetal.pdf · Visualising Linguistic Evolution in Academic

Interpreting Visualisation Results Case Study II - changes in variable FIELD

Analysing verb complements

Investigation of changes in usage: HELP plus complements

⇒ DEMO:

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 26 / 32

Page 27: Visualising Linguistic Evolution in Academic Discourseling.uni-konstanz.de/pages/home/hautli/lingvis/lingvis-slides-lydingetal.pdf · Visualising Linguistic Evolution in Academic

Interpreting Visualisation Results Case Study II - changes in variable FIELD

HELP plus Complements by Time

focus on verbs:frequency ordering of verb constructions for all registers:HELP+To+Inf HELP+Inf (1970/80s) ≥ HELP+Obj+Inf (catching upin 2000s)selection of HELP+To+Inf, focus on disciplines:increase over time for B3 (Mechanical), decrease in A and B4(Electrical), moderate changes in other disciplinesselection of HELP+Inf and HELP+Obj+Inf, focus on verbs:some distinct tendencies:

- A, B3 and B4/C4 strongly increasing- B1/C1 and B2/C2 are changing moderately

Link: www.eurac.edu/linfovis> LInfoVis programs and resources > Structured Parallel Coordinates > (sub)corpus comparisons

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 27 / 32

Page 28: Visualising Linguistic Evolution in Academic Discourseling.uni-konstanz.de/pages/home/hautli/lingvis/lingvis-slides-lydingetal.pdf · Visualising Linguistic Evolution in Academic

Interpreting Visualisation Results Case Study II - changes in variable FIELD

HELP plus Complements by Register

focus on verbs:HELP+Inf behaves most uniformly over all registersselection of HELP+Inf, focus on time:relative stable over subdisciplines, differences between B2 and C2selection of HELP+Obj+Inf, focus on disciplines:relative occurrences in B3 and C3 inversed from 70/80s to 2000sregisters layed out in detail, focus on verbs

- B3/C3 show inversed tendencies over time for HELP+Obj+Inf andless for HELP+To+Inf

- B4/C4 show relative stability over time periods for all verbconstructions

Link: www.eurac.edu/linfovis> LInfoVis programs and resources > Structured Parallel Coordinates > (sub)corpus comparisons

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 28 / 32

Page 29: Visualising Linguistic Evolution in Academic Discourseling.uni-konstanz.de/pages/home/hautli/lingvis/lingvis-slides-lydingetal.pdf · Visualising Linguistic Evolution in Academic

Conclusion and Future Work

Conclusion

We could show thatvisualisation allows to gain an overview and detect tendencies→ complex set of data in one displayinteractive features allow to dynamically focus on different aspectsof the data→ filtering and highlighting of specific subsets for detailedanalyses

⇒SPC facilitate our diachronic register analysis

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 29 / 32

Page 30: Visualising Linguistic Evolution in Academic Discourseling.uni-konstanz.de/pages/home/hautli/lingvis/lingvis-slides-lydingetal.pdf · Visualising Linguistic Evolution in Academic

Conclusion and Future Work

Future Work

Data Analysisuse different data layouts to feed several SPC visualisationsfocus on further features for the three contextual variablese.g., conjunctive relations expressing cohesion for mode.analyse several linguistic features at the same time(feature sets for register variation)provide a more fine-grained diachronic analysis(by publication years)

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 30 / 32

Page 31: Visualising Linguistic Evolution in Academic Discourseling.uni-konstanz.de/pages/home/hautli/lingvis/lingvis-slides-lydingetal.pdf · Visualising Linguistic Evolution in Academic

Conclusion and Future Work

Future Work continued

Technical Enhancementsfunction for automatic restructuring the underlying data to createdifferent layouts, e.g. the merging of axes with categorical values(e.g., axes registers and disciplines)introduction of a ’summary’ category on each data dimension(the sum of all individual values)function for selecting data items based on crossings or declinationof their connecting lineschanging the visualisation of overlapping lines(e.g. using semi-transparent or stacked lines)

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 31 / 32

Page 32: Visualising Linguistic Evolution in Academic Discourseling.uni-konstanz.de/pages/home/hautli/lingvis/lingvis-slides-lydingetal.pdf · Visualising Linguistic Evolution in Academic

Thank you!Questions? Comments? Suggestions?

[email protected]@mx.uni-saarland.de

www.eurac.edu/linfovis

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 32 / 32