multiple sequence analysis: a contextualized narrative approach to longitudinal data university of...

27
Multiple Sequence Analysis: a contextualized narrative approach to longitudinal data University of Stirling, September 2007 Gary Pollock Department of Sociology Manchester Metropolitan University [email protected]

Upload: charles-wagner

Post on 28-Mar-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Multiple Sequence Analysis: a contextualized narrative approach to longitudinal data University of Stirling, September 2007 Gary Pollock Department of

Multiple Sequence Analysis: a contextualized narrative approach to

longitudinal dataUniversity of Stirling, September 2007

Gary PollockDepartment of Sociology

Manchester Metropolitan [email protected]

Page 2: Multiple Sequence Analysis: a contextualized narrative approach to longitudinal data University of Stirling, September 2007 Gary Pollock Department of

Longitudinal processes

start and end times (EHA)

competing risk, multi-episode (EHA)

contiguous states as a single DV (SA)

ie. SA offers an alternative (complementary) approach to EHA

Page 3: Multiple Sequence Analysis: a contextualized narrative approach to longitudinal data University of Stirling, September 2007 Gary Pollock Department of

Sequence analysis using OMA

1. Sequences of statuses are processed by….

2. Optimal Matching Analysis (OMA) which results in …

3. A distance matrix representing the closeness (proximity) of each sequence with all others which can then be processed by…

4. Cluster analysis which leads to the construction of…

5. A typology of sequence categories

Page 4: Multiple Sequence Analysis: a contextualized narrative approach to longitudinal data University of Stirling, September 2007 Gary Pollock Department of

Single Sequences

• social class (S/N/M)• eg. case 1:SSSSSSSSSS• case 2:NNNNNSSSSS • case 3:NNNNNMMMMM etc.• Case Analysis: resulting typology is an end-in-itself• Variable Analysis: typology as a predictor or a dependent

variable• Class, employment status, qualifications, housing,

marital status, housing.. can all be analysed in this way – a range of typologies…but these don’t account for interactions as they are each independently arrived at

• why not combine sequence data prior to analysis in order to capture interactions?

Page 5: Multiple Sequence Analysis: a contextualized narrative approach to longitudinal data University of Stirling, September 2007 Gary Pollock Department of

Analysis: process

• Create sequence data file

• Determine what to do with internal gaps (fill, delete or skip)

• Determine the ‘costs’ to be used in the OMA (indel and substitution). These are the parameters which define the distances between the sequences. They work by giving low distance scores to similar sequences and high scores to dissimilar sequences

• Perform the OMA (though there other SA techniques)

• Weight the distances scores to account for different sequence lengths

• Perform cluster analysis

• Analyse clusters (i. sequence progression ii. covariates)

Page 6: Multiple Sequence Analysis: a contextualized narrative approach to longitudinal data University of Stirling, September 2007 Gary Pollock Department of

Indel and substitution costs

case 1:SSSSSSSSSScase 2:NNNNNSSSSS case 3:NNNNNMMMMM• If INDEL = 1 and SUBS = 2 (often a default setting)• 1,2 = 10• 1,3 = 20• 2,3 = 10• If INDEL = 1 and SUBS = 2 (NM, MN, SM,MS) and 1.5

(NS,SN)• 1,2 = 7.5• 1,3 = 17.5• 2,3 = 10

Page 7: Multiple Sequence Analysis: a contextualized narrative approach to longitudinal data University of Stirling, September 2007 Gary Pollock Department of

Data: BHPS 1991-2007

• born 1970-1975• tracked from age 21 to 29• data shifted to a common time axis• class and qualifications examined here (housing,

marriage, employment status and fertility status also processed)

• All internal gaps filled• All sequence lengths included• Year on year transitions used to inform

substitution costs

Page 8: Multiple Sequence Analysis: a contextualized narrative approach to longitudinal data University of Stirling, September 2007 Gary Pollock Department of

Sequence gaps over waves A to NLength of gaps across all waves A to N (note that a single case may have more than one internal gap on a single variable hence a straight % of no. gaps/N of sequences is problematic, instead % = of all gaps on that variable) 10,264 sequences 1 yr gap 2 yr gap 3 yr gap 4 yr gap 5 yr gap 6 yr gap 7 yr gap 8 yr gap 9 yr gap 10 yr gap 11 yr gap 12 yr gap N

Class 2,018 (54%) 761 (20%) 321 (9%) 230 (6%) 144 (4%) 87 (2%) 63 (2%)

35 34 17 10 10

3,730

Highest qualification 1,224 (70%) 274 (16%) 121 (7%) 67 (4%)

19 14 10 6 5 3 1 4

1,748

Page 9: Multiple Sequence Analysis: a contextualized narrative approach to longitudinal data University of Stirling, September 2007 Gary Pollock Department of

Data: BHPS 1991-2007

N 04 M 03 L 02 K 01 J 00 I 99 H 98 G 97 F 96 Year of birth = 1975: E 95 Year of birth = 1974: D 94 Year of birth = 1973: C 93 Year of birth = 1972: B 92 Year of birth = 1971: A 91 Year of birth = 1970: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 Age

Page 10: Multiple Sequence Analysis: a contextualized narrative approach to longitudinal data University of Stirling, September 2007 Gary Pollock Department of

Single Sequences: class

C 21 22 23 24 25 26 27 28 29

1 3 3 3 3 3 -1 -1 -1 -12 3 3 3 3 3 3 2 3 33 4 3 3 3 3 3 3 2 34 3 4 4 4 3 3 2 1 -15 6 6 4 4 6 -1 -1 -1 -16 5 5 5 5 5 6 6 6 67 0 0 3 3 3 3 3 3 38 0 0 0 2 4 2 2 2 29 3 3 3 3 3 3 3 1 310 2 2 4 4 2 2 5 6 1

0 = no job yet

1 = Service class Higher

2 = Service class Lower

3 = Non-manual

4 = Self

5 = Skilled

6 = unskilled

Page 11: Multiple Sequence Analysis: a contextualized narrative approach to longitudinal data University of Stirling, September 2007 Gary Pollock Department of

Proportions of time spent in a particular class

N=810 sch scl nm self Skil unsk0.0 75.9 54.1 50.9 87.9 66.9 60.9

0.2 8.5 10.4 8.6 4.1 8.1 7.3 0.4 8.0 14.9 11.7 3.8 6.8 11.1 0.6 4.3 9.8 10.0 2.0 7.0 7.7 0.8 2.2 5.6 7.7 0.9 3.5 5.4 1.0 1.0 5.3 11.1 1.4 7.7 7.7

Page 12: Multiple Sequence Analysis: a contextualized narrative approach to longitudinal data University of Stirling, September 2007 Gary Pollock Department of

Year on year class transitions

  0 sch scl nm self skil unsk Total

0 242 12 27 29 10 14 36 370sch 0 247 75 32 6 14 10 384scl 0 114 725 126 15 41 30 1051nm 0 73 202 960 14 21 45 1315self 0 5 13 10 175 14 25 242skil 0 21 57 27 23 597 97 822

unsk 0 18 41 82 25 108 686 960Total 242 490 1140 1266 268 809 929 5144

Page 13: Multiple Sequence Analysis: a contextualized narrative approach to longitudinal data University of Stirling, September 2007 Gary Pollock Department of

Year on year class transitions: off diagonal proportions (N = 1512)

SCH SCL NM Self Skilled UnskNone 1 2 2 1 1 2SCH 5 2 0 1 1SCL 8 8 1 3 2NM 5 13 1 1 3Self 0 1 1 1 2

Skilled 1 4 2 2 6Unsk 1 3 5 2 7

Total 100

Page 14: Multiple Sequence Analysis: a contextualized narrative approach to longitudinal data University of Stirling, September 2007 Gary Pollock Department of

Class substitution costs

   None sch scl nm self skil unskNone 0.0, 1.8, 1.8, 1.8, 1.8, 1.8, 1.8, sch 1.8, 0.0, 1.2, 1.3, 1.8, 1.7, 1.7, scl 1.8, 1.2, 0.0, 1.1, 1.7, 1.3, 1.3, nm 1.8, 1.3, 1.1, 0.0, 1.7, 1.6, 1.3, Self 1.8, 1.8, 1.7, 1.7, 0.0, 1.6, 1.6, Skil 1.8, 1.7, 1.3, 1.6, 1.6, 0.0, 1.2, unsk 1.8, 1.7, 1.3, 1.3, 1.6, 1.2, 0.0;

Page 15: Multiple Sequence Analysis: a contextualized narrative approach to longitudinal data University of Stirling, September 2007 Gary Pollock Department of

Cluster analysis of class sequences

   An eight cluster solution produces the following:

Clus % cases description

1 17 non manual, little if any mobility

2 12 service class, lower, little mobility

3 13 unskilled, little mobility

4 12 moving from unskilled to skilled work

5 15 mixed

6 6 skilled, little mobility

7 19 upwards mobility, NM, SCL, SCH

8 5 self employed, little mobility

Page 16: Multiple Sequence Analysis: a contextualized narrative approach to longitudinal data University of Stirling, September 2007 Gary Pollock Department of

Single Sequences: highest qualification

C 21 22 23 24 25 26 27 28 29

1 2 2 2 2 2 -1 -1 -1 -12 2 2 2 2 2 2 2 1 13 2 2 2 2 2 2 2 2 24 2 1 1 1 1 1 1 -1 -15 5 5 5 5 5 5 5 5 56 3 3 3 3 3 3 3 3 37 2 2 2 2 2 2 2 2 28 3 3 2 2 2 2 2 2 29 3 3 3 3 3 3 3 3 210 2 2 2 2 2 2 2 1 1

1 = HE

2 = Post GCSE/O grade

3 = GCSE / O grade

4 = Other

5 = None/at school

Page 17: Multiple Sequence Analysis: a contextualized narrative approach to longitudinal data University of Stirling, September 2007 Gary Pollock Department of

Proportions of time spent in highest qualification statuses

N=831 HE Post GCSE

GCSE Other None

0 79.8 35.5 72.8 89.5 92.9 0.2 0.5 7.9 1.9 0.4 0.1 0.4 2.2 8.9 1.9 0.7 0.7 0.6 1.6 4.7 2.4 1 0.4 0.8 6.4 4.7 1.9 0.2 0.6 1.0 9.6 38.3 19 8.2 5.3

Page 18: Multiple Sequence Analysis: a contextualized narrative approach to longitudinal data University of Stirling, September 2007 Gary Pollock Department of

Year on year changes in HEQ

•   HE A O Other None TotalHE 710 0 0 0 0 710A 136 2426 0 0 0 2562O 4 73 1133 0 0 1210Other 0 15 5 485 0 505None 0 14 0 1 264 279Total 850 2528 1138 486 264 5266

Page 19: Multiple Sequence Analysis: a contextualized narrative approach to longitudinal data University of Stirling, September 2007 Gary Pollock Department of

Year on year changes in HEQ: off diagonal proportions (N = 248)

  HE A O Other NoneHE 0 0 0 0

A 55 0 0 0

O 2 29 0 0

Other 0 6 2 0

None 0 6 0 0Total

Page 20: Multiple Sequence Analysis: a contextualized narrative approach to longitudinal data University of Stirling, September 2007 Gary Pollock Department of

HEQ substitution costs

 

None HE A O oth noneNone 0.0, 2.0, 2.0, 2.0, 2.0, 2.0,HE 1.8, 0.0, 2.0, 2.0, 2.0, 2.0, A 1.8, 1.1, 0.0, 2.0, 2.0, 2.0, O 1.8, 1.8, 1.2, 0.0, 2.0, 2.0, Other 1.8, 1.7, 1.6, 1.7, 0.0, 1.8, None 1.8, 1.8, 1.6, 1.7, 1.7, 0.0;

Page 21: Multiple Sequence Analysis: a contextualized narrative approach to longitudinal data University of Stirling, September 2007 Gary Pollock Department of

Cluster analysis of HEQ sequences

A seven cluster solution produces the following:Clus % cases description1 17 from GCSE to post-GCSE2 7 ‘late’ post GCSE to HE3 30 post GCSE, stable4 13 ‘early’ post GCSE to HE5 6 no qualifications6 14 GCSE, stable7  11 other, stable

Page 22: Multiple Sequence Analysis: a contextualized narrative approach to longitudinal data University of Stirling, September 2007 Gary Pollock Department of

Multiple Sequence Analysis (MSA)

• combine different sequences prior to OMA processing

• eg. class, qualifications, (housing, marital and fertility statuses) are combined in a single measure

• the sequences represent a narrative of change (or stability) on the measured dimensions

• the resulting typology can be analysed using case and variable methods as before but is in itself a representation of complex time embedded associations between the source variables

Page 23: Multiple Sequence Analysis: a contextualized narrative approach to longitudinal data University of Stirling, September 2007 Gary Pollock Department of

Multiple Sequences: class and highest qualification

C 21 22 23 24 25 26 27 28 291 23 23 23 23 23 -1 -1 -1 -12 23 23 23 23 23 23 22 13 133 24 23 23 23 23 23 23 22 234 23 14 14 14 13 13 12 -1 -15 56 56 54 54 56 -1 -1 -1 -16 35 35 35 35 35 36 36 36 367 20 20 23 23 23 23 23 23 238 30 30 20 22 24 22 22 22 229 33 33 33 33 33 33 33 31 2310 22 22 24 24 22 22 25 16 11

1st Digit:1 = HE2 = Post GCSE/O grade3 = GCSE / O grade4 = Other5 = None/at school

2nd Digit:0 = no job yet1 = Service class Higher2 = Service class Lower3 = Non-manual4 = Self5 = Skilled6 = unskilled

Page 24: Multiple Sequence Analysis: a contextualized narrative approach to longitudinal data University of Stirling, September 2007 Gary Pollock Department of

Year on year changes

• This is a large (35 by 35 ) matrix• Calculation of substitution costs as for single

sequence structure• Frequent transitions: • 1211 (2.9%)• 1312 (2.3%)• 2122 (2.6%)• 2221 (2.6%)• 2223 (4.5%)• 2322 (5.9%)• 2625 (2.4%)

Page 25: Multiple Sequence Analysis: a contextualized narrative approach to longitudinal data University of Stirling, September 2007 Gary Pollock Department of

Sequence analysis of class-HEQ data

Clus % description1 11 post GCSE, NM, stable2 8 post GCSEHE, NMSCL3 5 no quals, self empunsk4 10 GCSE, mixed emp (self,sk,unsk)5 7 post GCSE, NMSCL6 7 GCSE, NM both stable7 4 post GCSE skilled, both stable8 6 from unsk and sk SCH, HE9 15 mixed10 4 other quals and SCL, SCH11 3 post GCSE, SCH/SCLswitching12 6 other and sk/unsk , stable13 8 post GCSE , unsk stable14 2 post GCSE, self , stable

Page 26: Multiple Sequence Analysis: a contextualized narrative approach to longitudinal data University of Stirling, September 2007 Gary Pollock Department of

Advantages of MSA

• Is not limited to a single sequence measure

• Is not limited to a single event type

• Articulates the full scope of related sequences together

Page 27: Multiple Sequence Analysis: a contextualized narrative approach to longitudinal data University of Stirling, September 2007 Gary Pollock Department of

Issues

• Increasing complexity of the measure as new variables drawn in

• computing time / software switching

• Lack of formal rules in executing the OMA and clustering processes

• Largely exploratory: scope to develop in relation to EHA