from gene sequencing to genre sequencing: a corpus-based ... · reconciling genre and corpus...

Post on 10-Jun-2020

11 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

From gene sequencing to genre sequencing: A corpus-based analysis of British patents of invention, 1711 -2011Nicholas Groom and Jack GrieveCentre for Corpus Research Department of English Language and Applied Linguistics

Aims

1. To present a novel corpus-based methodology for diachronic genre analysis

2. To use this method to identify changes in the patent specification genre over three centuries

3. To address a fundamental theoretical question in genre studies

The paradox of genre

o Genres are, by definition, ‘generic’, which is to say, relatively consistent from one instance to the next.

o And yet we know that genres change over time, sometimes quite radically.

o How does genre change happen?

How does genre change happen?o Is it Darwinian, i.e. a constant and gradual

process of natural selection?

How does genre change happen?o Is it Darwinian, i.e. a constant and gradual

process of natural selection?o Or is it Kuhnian, i.e. characterized by periods

of stability punctuated by sudden and dramatic ‘paradigm shifts’?

How does genre change happen?o Diachronic studies of the scientific research

article genre (e.g. Gross et al 2002): gradual ‘evolution’ from C17th to the present

o Psychiatric case history genre (Berkenkotter2009): periods of stability punctuated by two ‘revolutions’: 1. Freud (c.1905)2. American Psychiatric Association DSM III (1980)

o Studies of more genres needed!

‘Genre’

o What do we mean by ‘genre’?o In everyday language (and in literary theory),

‘genre’ ≈ ‘text type’o For linguists and rhetoricians, ‘genre’ ≠ ‘text

type’.o This is because form and function do not

always match up

Same form, different functions

Different forms, same function

‘Genre’

o Linguists and rhetoricians define genre in terms of social function rather than textual form

‘Genre’

o Linguists and rhetoricians define genre in terms of social function rather than textual form

o Rhetorical theory:– Miller (1984: 159): genres are “typified

rhetorical actions based in recurrent situations”

‘Genre’

o Linguists and rhetoricians define genre in terms of social function rather than textual form

o Systemic-Functional Linguistics:– Martin (2005: 13): “genre represents the

system of staged goal-oriented social processes through which social subjects in a given culture live their lives.”

‘Genre’

o Linguists and rhetoricians define genre in terms of social function rather than textual form

o English for Specific Purposes (ESP):– Swales (1990: 58): “A genre comprises a

class of communicative events, the members of which share some set of communicative purposes.”

‘Genre’

o Linguists and rhetoricians define genre in terms of social function rather than textual form

o Corpus linguistics:– Biber (1988: 170): “Genre categories are

determined on the basis of external criteria relating to the speaker's purpose and topic; they are assigned on the basis of use rather than on the basis of form.”

Reconciling genre and corpus

o Linguists and rhetoricians define genre in terms of social function rather than textual form.

Reconciling genre and corpus

o Linguists and rhetoricians define genre in terms of social function rather than textual form.

o However …“… while genre is not limited to its form, form is indeed an important aspect of genre” (Tardy & Swales 2014: 166).

Reconciling genre and corpus

o Bazerman (1988: 62): “the formal features that are shared by the corpus of texts in a genre and by which we usually recognize a text’s inclusion in a genre, are the linguistic/symbolic solution to a problem in social interaction”.

Reconciling genre and corpus

o Bazerman (1988: 62): “the formal features that are shared by the corpus of texts in a genre and by which we usually recognize a text’s inclusion in a genre, are the linguistic/symbolic solution to a problem in social interaction”.

o So, which “formal features” should corpus linguists focus on?

Which ‘formal features’?

o Tardy & Swales (2014: 166-167): “Users – and in some cases, non-users –generally recognize a genre based on formal features like lexis, grammar, organizational patterns, topics, and even document format and associated visuals.”

Which ‘formal features’?Biber and Conrad (2009: 16):

Which ‘formal features’?

Tardy & Swaleso lexiso grammaro organizational

patternso topicso document format and

associated visuals

Biber & Conrado specialized

expressionso rhetorical organizationo formattingo usually once-occurring

in the text, in a particular place in the text

Which ‘formal features’?

Tardy & Swaleso lexiso grammaro organizational

patternso topicso document format and

associated visuals

Biber & Conrado specialized

expressionso rhetorical organizationo formattingo usually once-occurring

in the text, in a particular place in the text

Which ‘formal features’?

Tardy & Swaleso lexiso grammaro organizational

patterns

o document format and associated visuals

Biber & Conrado specialized

expressionso rhetorical organizationo formattingo usually once-occurring

in the text, in a particular place in the text

Previous work

o Most previous corpus-based genre studies have focused on ‘rhetorical organization’– a.k.a. ‘corpus-based move analysis’– E.g. Biber et al (2007); Upton & Cohen

(2009)

Our approach

o We try to stay ‘on the surface’ as much as possible, and focus strongly on the sequencing of textual elements (hence ‘sequence analysis’ rather than ‘move analysis’)

Our approach

o Aim: describe sequencing of formal features– in individual exemplar texts, and– across texts diachronically

o … in order to see how prototypical generic forms become established, and how they change over time

o Empirical focus: patent specification genre, 1711-2011

Patentso Intellectual property protection for inventions with

industrial applicability.o Rationale for patenting:

– Inventor potentially benefits financially from period of protection

– This in turn incentivizes scientific and technological innovation

– The public benefits from this, and from requirement that patent must describe invention in detail; knowledge becomes public property on expiry of patent.

History

o ‘Patents’ ← ‘Letters Patent’, i.e. ‘open letter’ o = royal proclamation granting a right (written

records from 1201).o Many issued for (often spurious) monopoly rights.o Statute of Monopolies 1623: abolished all

manufacturing and commercial patents except for those granted for “the sole workinge or makingeof any manner of new manufactures within this Realme, to the true and first inventor or inventors of such manufactures”.

Historyo Until end of C17th, patent grant was on condition

that the inventor would, after a period of seven years, take on apprentices “and teach them the knowledge and mystery of the said new invention”.

o During the early C18th, transmission of knowledge via apprenticeships was replaced by requirement for the inventor to lodge a written specification, describing the invention in full.

o So, patents played an important role in shift from oral to literate culture

Historyo The patenting system was a key driver of the

Industrial Revolution (1760 - c.1840) (Nuvolari & Tartari 2011; Bottomley 2014).

Source: Nuvolari and Tartari (2011: 102)

Historyo By mid C19th, modern patent systems were

being established in the UK, USA and elsewhere.

o C20th: emergence of international regimes, e.g. EPO (1949) and PCT (1970)

o Patents Act 1977: assimilated UK patents into European system.

Today

Source: http://www.wipo.int/ipstats/en/charts/ipfactsandfigures2016.html

Why patents?

o Important and historically significant genre.o Studied extensively in some fields (e.g. NLP,

legal and economic history, science and technology studies, rhetorical studies …)

o But hardly studied at all by linguists.o We hope to change this!o Ideal focus for investigating our theoretical

question.

Why patents?o Unusually, the patent specification is a genre

that can be traced all the way back to its very first exemplar: Nasmith’s patent, 1711

Why patents?

o We know that the patent specification genre has changed dramatically over the last 300 years

1711 2011

Why patents?

o But how has this change happened: through gradual and constant ‘evolutionary’ modifications, or through sudden and dramatic ‘revolutionary’ shifts?

Corpuso BLEPAS o British Library & Espacenet Patent Archive

Sampleo One text per year from 1711 to 2011 o ‘long and thin’ as opposed to ‘short and fat’

(Rissanen 2000; Kohnen 2007). (Ultimate aim = long and fat!)

Dataset

o Dataset for current analysis covers 276 years between 1734 and 2011. The underlying dataset also includes texts from 1711 to 1733, but there are only 4 years with data in that span, so we have excluded them from this preliminary analysis.

o We also lack data for 1739 and 1758 because no patents were issued in those two years.

Dataset

o The dataset for the current study consists of a list of 276 short strings of alphabetical characters.

o Each string represents the generic structure of a single randomly selected patent for each year from 1734 to 2011.

How dataset was built

Step 1: We reduced each text in the corpus to a code string representing that text as a sequence of ‘formal features’

SALUTATION

f

Declaration of grant of patent

Declaration of grant of patent

a

Statement of condition of grant

b

Description of inventionc

c (continued)

Witness statement and signature d

Other witness signature(s)h

Confirmation that specification has been enrolled within specified time limit e

Drawings

i

How dataset was builtStep 1: We reduced each text in the corpus to a code string which represents that text as a sequence of generic features

= fabcdhei

How dataset was builtStep 2: We recorded each code string in a spreadsheet

How dataset was built

Step 3: We read the spreadsheet into R as a dataframe for further processing and analysis.

Overview of sequence types74 sequence types (final set will be smaller!)Many are slight variants of most frequent types

Diachronic distribution of sequence typeso How are these different sequence types

distributed across the period of our study?

or ???

Diachronic distribution of sequence typeso To answer this question, we use string edit

distance (commonly used in DNA sequencing)

o String edit distance measures the number of operations (i.e. insertion, deletion, substitution) needed to transform one string to another.

o E.g. gene genre has a string edit distance of 1.

String edit distance analysis

o We use the stringdist() function in R, applying the default optimal string alignment metric (OSA), also known as restricted Damereau-Levenshtein distance.

String edit distance analysis

o First, we plot string edit distance between all adjacent patent sequences.

Adjacent string edit distance

Adjacent string edit distance

Multivariate Analysis of String Edit Distanceo Next, we compute string edit distance

between all years, using this information to cluster patents by year.

o First, we make a distance matrix of edit distances between all pairs of strings using the stringdistmatrix() function in R.

o Then we run a simple metric multidimensional scaling to reduce this matrix down to two dimensions.

Multivariate Analysis of String Edit Distanceo To make clusters more clearly visible, we also

analyzed the distance matrix by applying a hierarchical cluster analysis (using Ward’s method).

o This yields 5 main clusters.

Cluster dendrogram

Multivariate Analysis of String Edit Distanceo Interestingly, these 5 main clusters turn out to

be identified with distinct time periods when plotted on a timeline:

Patent Law Amendment Act 1852

Patents, Designs and Trademarks Act, 1883; Paris Convention 1883.

European Convention on the International Classification of Patents 1954

Patents Act 1977

Individual move analysis

o We also decided to trace the diachronic distribution of each of the individual elements in our coding scheme (‘move codes’), regardless of position within each string.

o Interestingly, many of these seem to appear and disappear extremely abruptly during the period of the analysis:

Move f: Salutation

Move a: Declaration of grant of patent

Move b: Statement of condition of grant

Move z: Abstract

Individual move analysis

o Is this abruptness simply an artefact of our ‘one-text-per-year’ sampling method?

o NO: a few moves do appear and disappear more gradually:

Move J: Drawings

Move h: Statement of petition

Interim conclusionso Genre change: evolution or revolution?o For patents: both processes can be

observed; seems to depend to some extent on the kind of analysis applied to the data.

o Variation from one year to the next is a constant process of (mainly) gradual change

o Shifts between broad generic sequence types are sudden and dramatic – external forces?

o Individual moves can appear and disappear suddenly or gradually.

Ongoing and future worko Currently refining/reducing move categories!o N-gram analysis of move sequences in each

cluster time period – our hypothesis is that most frequent (i.e. dominant) variant will appear late in each period (natural selection)

o Analysis of individual move positions over time – do they change or stay in the same place?

o Lexicogrammatical analysis of patents/moves using MDA.

Thank you!

n.w.groom@bham.ac.uk

ReferencesBazerman, C. (1988). Shaping written knowledge: The genre and activity of the experimental article in science. Madison: University of Wisconsin Press. Berkenkotter, C. (2009). Patient tales: Case histories and the uses of narrative in psychiatry. Columbia, SC: University of South Carolina Press.Biber, D. (1988) Variation across Speech and Writing. Cambridge: Cambridge University Press.Biber, D. & Conrad, S. (2009). Register, Genre and Style. Cambridge: Cambridge University Press. Biber, D, Connor, U. & Upton, T.A. (2007). Discourse on the move: Using corpus analysis to describe discourse structure. Amsterdam: John Benjamins.Bottomley, S. (2014). The British patent system during the industrial revolution 1700–1852: From privilege to property. Cambridge: Cambridge University Press.Devitt, A. (2004). Writing genres. Carbondale, IL: Southern Illinois University Press.Gross, A. G., Harmon, J. E., & Reidy, M. (2002). Communicating science. The scientific article from the 17th century to the present. Oxford: Oxford University Press. Kohnen, T. (2007). ‘From Helsinki through the centuries: the design and development of English diachronic corpora.’ In P. Pahta, I. Taavitsainen, T. Nevalainen & J. Tyrkko (Eds.), Towards Multimedia in Corpus Studies (Studies in Variation, Contacts and Change in English 2). Helsinki: VARIENG. http://www.helsinki.fi/varieng/series/volumes/02/kohnen/

ReferencesMartin, J.R. (2005) ‘Analysing genre: functional parameters.’ In J.R. Martin & F. Christie (eds.) Genre and Institutions: Social Processes in the Workplace and School. London: Cassell, 3-39.Miller, C. (1984). Genre as social action. Quarterly Journal of Speech, 70, 151-76. Nuvolari, A. & Tartari, V. (2011) ‘Bennet Woodcroft and the value of English patents, 1617–1841.’ Explorations in Economic History 48: 97-115.Rissanen, M. (2000). ‘The world of English historical corpora: From Cædmon to computer age.’ Journal of English Linguistics 28/1: 7-20.Swales, J. M. (1990). Genre Analysis: English in academic and research settings. Cambridge: Cambridge University Press. Swales, J. M. (2004). Research Genres: Exploration and applications. Cambridge: Cambridge University Press. Tardy, C.M. & Swales, J.M. (2014). ‘Genre analysis.’ In K.P. Schneider and A. Barron (Eds.) Pragmatics of discourse (pp.165-187). Berlin, Germany: Walter de Gruyter.Upton, T.A. & Cohen, M.A. (2009). An approach to corpus-based discourse analysis: The move analysis as example. Discourse Studies, 11, 585-605.

top related