john rae, partner - data and product development simon power, principal consultant hnda training for...
TRANSCRIPT
John Rae, Partner - Data and Product Development
Simon Power, Principal Consultant
HNDA Training for Practitioners, 6th May 2014
Paycheck Income Data
2
Agenda
• What is Paycheck
• Sources• Using the geographic hierarchy• Multi-method models
• Latest data ideas• Keeping up to date
• Licence use• Opportunities
Introduction
Method
Innovations
Limitations
3
What is Paycheck?
4
Paycheck provides estimates of household income
UK wide estimates Mean Median Mode Distribution by income band
Full geographic detail Modelled down to full postcodes Potential to aggregate to any geographic area
Option to sub-divide incomes by lifestage
5
Data and modelling concepts
6
Modelling objective
To fit an appropriate statistical distribution to data on household incomes
To predict this distribution for all relevant geographic areas by means of the mean and standard deviation
7
Geographical hierarchy of the modelling
8
Outline of data inputs
1. Survey data Structured to be representative Sample will rarely include data points in a given local area
2. Large lifestyle database Unrepresentative collection method Large sample size will include data for most local areas
Surveys are good best nationally, lifestyle is better locally.
9
Survey Data
Survey Data
Step 1 – Bring the data up to date
Lifestyle data Living Costs and Food survey Sample 6,500 across the UK Available survey period typically
two years ago
Data Locator Group Covers 1.2 million individuals in
Scotland (UK 15 million) After cleaning we get data for
856,000 households in Scotland (UK 4.3 million)
We apply weighting to match the survey data at national level
We inflate to the present using Average Earnings time series
We bring the incomes up to current year using Average Earning change figures published by ONS
10
Step 2 - Establish the current UK earning profile
Take the household incomes measured by the survey Inflate these to current year figures Represent the distribution as bands of £5,000 Model incomes above £100,000 as an exponentially decaying distribution Transform the (percentile points of) the distribution to fit a standard
normal distribution
All subsequent modelling is conducted on normally distributed variables and a reverse transformation converts the model results back to real income values
11
Demographic modelling
Step 3 - Bayesian modelling approach
’Direct’ calculation Take a sample of lifestyle
data (representative of national socio-demographics)
Build linear regression models to estimate (transformed) income from the demographics
Apply to local areas based on local socio-demographics
Calculate incomes directly from the Lifestyle data
Create (local) correction factors for the initial model estimates in light of the actual scores
Repeat for the next (smaller) geographic level
Undo the transformation £
12
Points of Discussion
Why LCF as opposed to other surveys?
Why a UK model?
How is it kept up to date?
13
Data innovations
14
Places change….. and we can impute data about them
15
What is Acorn and why is it relevant to the income model?
16
Which is Ethel ? Which is Kayleigh?
17
The purpose of geodemographics
To analyse data and facilitate educated guesses.
Which channels fit which people? Where might it be more likely to find people with unhealthy lifestyles? Which people are using which of my services and in what manner?
18
49 Young families in low cost private flats50 Struggling younger people in mixed tenure51 Young people in small, low cost terraces52 Poorer families, many children, terraced housing53 Low income terraces54 Multi-ethnic, purpose-built estates55 Deprived and ethnically diverse in flats56 Low income large families in social rented semis57 Social rented flats, families and single parents58 Singles and young families, some receiving benefits59 Deprived areas and high-rise flats
34 Student flats and halls of residence35 Term-time terraces36 Educated young people in flats and tenements37 Low cost flats in suburban areas38 Semi-skilled workers in traditional neighbourhoods39 Fading owner occupied terraces40 High occupancy terraces, many Asian families41 Labouring semi-rural estates42 Struggling young families in post-war terraces43 Families in right-to-buy estates44 Post-war estates, limited means45 Pensioners in social housing, semis and terraces46 Elderly people in social rented flats47 Low income older people in smaller semis48 Pensioners and singles in social rented flats
21 Farms and cottages22 Larger families in rural areas23 Owner occupiers in small towns and villages24 Comfortably-off families in modern housing25 Larger family homes, multi-ethnic areas26 Semi-professional families, owner occupied neighbourhoods27 Suburban semis, conventional attitudes28 Owner occupied terraces, average income29 Established suburbs, older families30 Older people, neat and tidy neighbourhoods31 Elderly singles in purpose-built accommodation32 Educated families in terraces, young children33 Smaller houses and starter homes
1 Exclusive enclaves2 Metropolitan money3 Large house luxury4 Asset rich families5 Wealthy countryside commuters6 Financially comfortable families7 Affluent professionals8 Prosperous suburban families9 Well-off edge of towners10 Better-off villagers11 Settled suburbia, older people12 Retired and empty nesters13 Upmarket downsizers
It looks like this
Affluent Achievers1
Comfortable Communities3
Financially Stretched4
Urban Adversity5
A. Lavish Lifestyles
B. Executive Wealth
C. Mature Money
D. City Sophisticates
E. Career Climbers
F. Countryside CommunitiesG. Successful SuburbsH. Steady NeighbourhoodsI. Comfortable SeniorsJ. Starting Out
K. Student Life
L. Modest Means
M. Striving Families
N. Poorer Pensioners
O. Young Hardship
P. Struggling Estates
Q. Difficult Circumstances
Category
14 Townhouse cosmopolitans15 Younger professionals in smaller flats16 Metropolitan professionals17 Socialising young renters18 Career driven young families19 First time buyers in small, modern homes20 Mixed metropolitan areas Rising Prosperity2
Group Type
19
The new Acorn has revolutionised geodemographics
Peter SleightChair, The Association of Census Distributors'Tracking a decade of changing Britain‘, Market Research Society seminar, November 2013
All thit is relevant because…..
20
Registers of ScotlandLand RegistryNational Register of Social HousingFoI requests to LAD’sPublic register of HMO’s
Zoopla property portalsCACI lifestyle databasesHousing for the elderlyCACI High rise dwellings database
Data is derived by combining multiple sources
e.g. Local level housing type and tenure
21
Adding to the census
Variables no longer on the census
Identify likely locations of high rise buildings
WALK THE STREETS Create a database of addresses;
Social high rise (10+ storey) Social mid-rise (5-9 storey)
22
Data is derived by combining multiple sources
e.g. Local level family structure, occupation and affluence
CACI names and addressesCredit application age dataElderly-only accommodationEmma’s diary children databaseDWP claimant data
CACI lifestyle database UCL ethnicity imputationCompany directorsShareholdersStudents
23
Adding to the census
Replace the census - housing for specific categories of people
Improve the census
24
This approach realises a lot of address level data…
22mhouseholds where we
have detailed age data
21.5m
households where we have
housing / tenure data 10m
households where we have more detailed
socio-demographics
3mpeople in
HMO’s
600,000age-limited addresses
25
And produces a remarkable outcome…
Prior to the release of the census
Following the release of the census
• We built the segmentation without census inputs
• We linked to research surveys to form an insight test-bed
• We optimised across over 2,500 topics
• We added in the census and checked for change
• We found including the census made no difference to the structure of the segmentation
The approach appears to achieve the equivalent (for these geodemographic models) of having a census every year
26
As an illustration…
South Ayrshire’s first affordable housing in 30 years, the Somerset Road Development includes:
West of Scotland Housing Association’s development of 32 flats as part of a bigger development of 76 units.
Dawn Homes development of 44 homes for outright sale.
Segmentation types.. 49 Young families in low cost private flats 50 Struggling younger people in mixed tenure
27
Licencing – limitations and opportunities
28
Limitations and opportunities
End User Licence
Contractual restrictions on the use of the data
Council use
Third parties
29
Limitations and opportunities
Opportunities
Other CACI Datasets
30
Limitations and opportunities
Opportunities
Other CACI Datasets
Consumers.Locations.
Communities.
Individual
Postcode
AffluenceAffluence
DigitalDigital
Current Demographics
WorkforceACORN
Retail, Leisure & Financial Catchments
Public Transport Access Levels (PTAL)
Retail Spend Estimates
Online Spend Estimates
2011 Census
Out of Work Benefits
Retail, Leisure & Financial Outlets
Job Seekers Allowance
British Crime Survey
FRS: GFKNoP’s Financial Research Survey
Understanding Society
IrishACORN
TGI
Worker Spend Estimates
Rail Passengers
Tourist Spend Estimates
Hospitals/GPs/Schools/Libraries
KEY NAME DATA-SET KEY THEMES TYPE LEVEL DEMOGRAPHICSAFFLUENCE &
WEALTHLIFESTAGE FINANCE ATTITUDES BRANDS CHANNEL
DIGITAL BEHAVIOUR
PUBLIC SERVICES RETAIL
Acorn Lifestyle C
Household Acorn Lifestage C
my.Acorn Bespoke Segmentation C
Wellbeing Acorn Health & Wellbeing C
Financial ACORN Finance, geodemographics C
EuroACORN Europe geodemographics C
SocialScene ACORN Eating-out & Drinks Market C
PayCheck Household Income M
StreetValue House Value M
eTypes Online Behaviour C
People UK Affl uence, Lifestage C
Fresco Affl uence, Lifestage, Finance C
Ocean Lifestyle, Affl uence, Digital M
Retail Dimensions Retail Market Analysis M
TGI Media, Brands, Attitude S
Understanding Society Longitudinal Research GS
GfK FRS Financial Research S
British Crime Survey Attitudes on Crime GS
Public Transport Access and Passenger Counts G
Service Locations Public Service Outlets G
Benefits Claimants Details on Claimants G
2011 Census Detailed, small area data G
Current Demographics Detailed, small area data M
Spend Estimates Consumer Expenditure M
Outlets 503k Outlets M
Catchments Gravity Models M
Workforce ACORN Lifestyle of Working Population C
Irish ACORN Lifestyle (Eire) C
C = ClassificationKEY M = Modelled S = Survey G = Government = Postcode or > = Individual = Location = Area/Catchment = Anonymous= Household
33
Summary
• Seeking the best between national surveys and very local data
• Data techniques experts consider to be revolutionary
• Use within the HNDA• Options for other uses
Paycheck
Up to date
Usability
34
Questions