‘Improving the ethnic classification of patient registers’
Centre for Advanced Spatial Analysis, UCL
25th May, 2005
Objectives of seminar
1. Promote awareness of existing tools for targetting health communications at ‘ethnic’ groups
a. Individual patients on registersb. Surgeries and other contact pointsc. Local promotional activities
2. Discussion of ideas for collaborative use of individual patient registers to improve the quality of ‘ethnic’ coding
3. Forum for exchange of informationa. Methodsb. Ethics and other data protection issues
Our conception of ‘Ethnicity’
Multi-variate classification based on a combination of :
Cultural Origin – eg religion, beliefs
Ethnicity – eg country of origin, diet
Language
Background to this seminar
Context : CASA (1)
• 1970s - Neighbourhood classifications used for prioritising public sector initiatives
• 1990s - Application of postcode classifications adopted by commercial companies
• 2002 – CASA becomes involved in the application of Mosaic in health, policing and education 2003
• CASA work with Dr Foster on Slough Diabetes project
Context : CASA (2)
• 2004 – CASA sets up Knowledge Transfer Partnership (KTP) with Camden PCT to develop health applications of geodemographics
• 2004 – CASA wins ESRC grant for ‘quantitative analysis of names’
• 2005 – Camden PCT develops capability in the application of ‘names’ as well as Mosaic to targetting of public health campaigns
Contact details
• CASA website• E/mail addresses
– Pablo Mateos – [email protected]– Richard Webber – [email protected]– Paul Longley –[email protected]
www.casa.ucl.ac.uk/geonom
‘Quantitative Analysis of Names’
• ESRC funded project– Use of surname as an identifier of cultural
origin• Regional origins of English names• Regional distribution of Celtic names• Current locations of names ‘imported from abroad’
– Jewish– Continental European and Hispanic– Asian– African– Middle Eastern
Identification of potential applications
• Academic / Social Scientific– Study of meaning of names– Studies of historic migration patterns– Social mobility of Celtic migrants to England
• Policy applications– Measurement of ‘social capital’– Differentiation of crude ‘South Asian’ definition– Targetting of public sector communications programmes– Auditing of equal opportunities in employment
Key data files
• 40 million records Experian 1996 GB electoral roll– First name– Surname– Postal area code– Mosaic code
• 26 million records 1881 census
• Summary statistics on name frequencies by region from Anglophone diaspora– US, Canada, Australia, New Zealand, North and Southern
Ireland
Geography of the name ‘Webber’
% electors with occupational names
% electors of Welsh surnames
CEL assignment : Phase one
• Identify 25,000 surnames with > 100 occurrences in 1996
• Assign to hierarchy– English; general name type; detailed name
type– Celtic; country of origin; general type– Imported from abroad; region of origin;
country of origin
Webber
• Level one : English ‘metonym’
• Level two : Metonym ending in ‘-er’
• Level three : Manufacturing occupation
Zhang
• Level one : Imported from abroad
• Level two : East Asian
• Level three : Chinese
Names > 100 occurrences Count
English 19,246
Celtic 3,396
Imported from abroad 2,987
Total 25,630
Muslim and South Asian names (1462)
IMPORTED FROM ABROAD;MUSLIM;AFGHAN 19
IMPORTED FROM ABROAD;MUSLIM;BANGLADESHI 83
IMPORTED FROM ABROAD;MUSLIM;ERITREAN 3
IMPORTED FROM ABROAD;MUSLIM;LEBANESE 2
IMPORTED FROM ABROAD;MUSLIM;MIDDLE EASTERN 125
IMPORTED FROM ABROAD;MUSLIM;NORTH AFRICAN 33
IMPORTED FROM ABROAD;MUSLIM;PAKISTANI & IRANIAN 203
IMPORTED FROM ABROAD;MUSLIM;PERSONAL NAME 85
IMPORTED FROM ABROAD;MUSLIM;SOMALI 40
IMPORTED FROM ABROAD;MUSLIM;SUDANESE 1
IMPORTED FROM ABROAD;MUSLIM;TURKISH 88
IMPORTED FROM ABROAD;OTHER SOUTH ASIAN;HINDI 254
IMPORTED FROM ABROAD;OTHER SOUTH ASIAN;NEPALESE 2
IMPORTED FROM ABROAD;OTHER SOUTH ASIAN;NORTH INDIAN 184
IMPORTED FROM ABROAD;OTHER SOUTH ASIAN;SIKH 290
IMPORTED FROM ABROAD;OTHER SOUTH ASIAN;SOUTH INDIAN & SRI LANKAN 50
Phase one assignment method(25,000 names with > 100 occurrences)
1. General knowledge
2. Identification of top postal area and level of concentration in it
3. Identification of top Mosaic type and level of concentration in it
4. Identification of concentration in 1881
5. Frequencies in other Anglophone countries
C20 : Suburban Comfort / Asian Enterprise
Wakemans Hill, Colindale, NW9 0UU
The Warren, Heston, TW5 0JW Headcorn Road, Thornton Heath, CR7 6JS
Himley Crescent, Wolverhampton, WV4 5DA
D26 : Ties of Community / South Asian Industry
Aberdeen Place, Bradford, BD7 2HG
Ivy Road, Luton, LU1 1DL Edmundson Road, Blackburn, BB2 1HL
Osborn Road, Sparkbrook, Birmingham, B11 1TT
Status and Asian names
Asian Enterprise
South Asian Industry
Mayat 145 9533
Lorgat 971 8840
Gorasia 7622 275
D27 : Ties of Community / Settled Minorities
Algernon Road, Lewisham, SE13 7AP
Essex Road, Leyton, E10 6BT Guildersfield Road, Streatham, SW16 5LS
Melbourne Road, Walthamstow, E17 6LR
F36 : Welfare Borderline / Metro Multiculture
Broadwater Farm, Tottenham, N17 6HT
Hillcrest, Highgate, N6 4EX Kenninghall Road, Lower Clapton, E5 8DG
Samuel Street, Woolwich, SE18 5LJ
Output
• Directory assigning a Cultural / Language / Ethnicity code to each name with more than 100 occurrences on the GB electoral roll
Phase two assignment(all surnames > 5 occurrences)
• Rank first names by frequency
• Allocate names to CEL categories where possible
• Identify for each surname the proportion of associated first names in known CEL categories
Selected first names
Agapios Antigoni Sotiri Sotiris
Total occurrences
10 57 11 224
Surnames not British
6 34 7 143
Surnames Greek
3 19 5 75
Output
• Database giving for 60,000 surnames ‘imported from abroad’– % electors by CEL of first name– Most common cell (three level hierarchy)
• Database giving for 60,000 first names ‘imported from abroad’ – (3.2m occurrences)– % electors by CEL of surname– Most common cell (three level hierarchy)
Evaluation of solution• Seems to work well for all ethnic groups other than Caribbeans• CEL overlaps between surname and first name
– South Asians and Muslims – 80%– Africans, Turks, Cypriots, Chinese – 50%– Hispanics – 20%– Other Europeans – 8 – 15%– Jew – 4%– Irish, Scots, Welsh – 3%
• High overlap between certain CELs – within Muslim group– Spain, Portugal, Italy– Netherlands, Germany and Czech Republic
• Confusion among serial migrant groups– Hispanic migrants to India– Chinese migrants to West Indies