madness in the dust: is having dementia linked to where you live?
TRANSCRIPT
By Frank Kelly, Ondrej Urban
and Esther Remmelink
Madness in the dustIs having dementia linked to
where you live?
Top 5 causes of death in London in 2012
1. top five female death categories in London:
1. Ischaemic heart diseases;
2. Dementia and Alzheimer’s disease;
3. Cerebrovascular diseases;
4. Malignant neoplasm of trachea, bronchus
and lung;
5. Chronic lower respiratory diseases;
2. top five male death categories in London:
1. Ischaemic heart diseases;
2. Malignant neoplasm of trachea,
bronchus and lung;
3. Chronic lower respiratory diseases;
4. Cerebrovascular diseases;
5. Dementia and Alzheimer’s disease;
http://cleanair.london/hot-topics/first-ever-rankings-of-top-10-death-rates-for-every-london-borough/
http://www.sciencemag.org/news/2017/01/brain-pollution-evidence-builds-dirty-air-causes-alzheimer-s-dementia
Alzheimer's Disease (AD)A progressive neurodegenerative disorder:
● 15-20 years’ accumulation of “plaques”
● short-term memory problems → death (traumatic)
Most common form of dementia (60%+ of cases):
Cause poorly understood: genetic + environmental factors
Harmful factors: old age, female gender, poor cardiovascular function, diabetes
Protective factors: intellectual activities, physical activity, social interaction
Changes in demographics means global burden of AD will increase http://www.alz.org/
http://www.un.org/esa/population/publications/
Background
WHO are we?
Frank, Ondrej and Esther are data
scientists at @HAL24K
(smart cities data science consultancy)
HOW are we affected?
● Family members with Alzheimer's Disease
● Have studied neuroscience
Also:
● Have lived in polluted cities
How pollutants can affect the central nervous
systemNot all reach the brain:
Fine particles, e.g.
Particulate Matter (PM)
2.5 = particles <2.5μm)
Heusinkveld, NeuroToxicology, 2016
PM2.5 particles
Two main sources:
● Anthropogenic = human-made, e.g.
fuel combustion, industry, vehicles,
agriculture
● Non-anthropogenic = non-human
made, e.g. forest fires
Measured in µg/m3
https://uk-air.defra.gov.uk/assets/documents/reports/cat09/1310021025_AQD_DD4_2011mapsrepv0.pdf
https://uk-air.defra.gov.uk/assets/documents/reports/cat11/1212141150_AQEG_Fine_Particulate_Matter_in_the_UK.pdf
Image credit: http://www.solarcrest.co.uk/images/PM2-point-5.jpg
Urban PM2.5
Black carbon = a major part
of PM2.5 due to road traffic
https://uk-
air.defra.gov.uk/assets/documents/reports/cat11/1212141150_
AQEG_Fine_Particulate_Matter_in_the_UK.pdf
Project & Presentation Goals
- Use open data to verify for ourselves the link between Alzheimer’s prevalence
and air pollution exposure
- Demonstrate nice map visualisation methods in Python
- Discuss challenges and interesting findings
Our Study: Air Quality (AQ) data
USA (California) UK (England) Netherlands
(Netherlands)
Time span 1990 - 2016 (AQ) 2010 - 2015 2011 - 2016
Measurement
of air
pollution
PM2.5 continuously
measured from state
environmental agency
operated air quality
monitoring stations
Population-weighted
anthropogenic and non-
anthropogenic
PM 2.5 estimates per local
authority (started 2009)
PM 2.5 measured from
50 air quality
monitoring stations
(part of national
network)
Website https://www.epa.gov/outdoor-
air-quality-data
https://uk-air.defra.gov.uk/ http://www.rivm.nl/
Our Study: Alzheimer’s (AD) prevalence data
USA (California) UK (England) Netherlands
(Netherlands)
Time span 2010 - 2014 2013 2010 - 2014
Disease
prevalence
sample
Registered Medicare
patient Alzheimer’s &
other dementias
prevalence, split over
and under 65
Counts of diagnosed
dementia cases in 12 age
bins; ranging from 30-34
through to 90+ years of age
Counts of diagnosed
dementia cases
Region division County Parliamentary constituency ‘Gemeente’ = city
council
Geospatial Visualisation in Python: options
• Geopandashttp://geopandas.org/
● Bokeh
http://bokeh.pydata.org/en/lates
t/
See our demo in github: https://github.com/ondrejiayc/PyDataLondon2017
Netherlands: PM 2.5
<-
cumulative
PM2.5 count
over 4 years
->
Long-term
average
wind speed
Wind + PM2.5 Animation: http://aqicn.org/faq/2015-11-
05/a-visual-study-of-wind-impact-on-pm25-concentration/
How? ● Multiplying rates by a standard population distribution
(e.g. the W.H.O. standard).
Adjusting for age distributionAdjust for age in the population studied to be able to compare different
populations,
e.g. between counties.
W.H.O. pop.
statistics for
developed
nations, 2010
<-
cumulative
PM2.5 count
over 4 years
Netherlands: PM 2.5 vs. (adjusted) dementia
prevalence
->
Crudely age
adjusted
dementia
prevalence
Limitations of Californian & Dutch data
● Dementia prevalence data lacked age
breakdown in Netherlands and USA
● Sample bias in USA Dementia data:
(Medicare scheme only)
● PM 2.5 monitors not evenly spread out
across California
UK geospatial data: boundaries
● PM 2.5 data by local authority
● Dementia count for each age group recorded
per parliamentary constituency
● Shape files available from Ordnance Survey
https://www.ordnancesurvey.co.uk/business-and-government/products/boundary-line.html
Geopandas capabilities - spatial joins for London
Cumulative 5 year human-made PM 2.5 by Local
Authority
Dementia prevalence by Westminster constituency
England’s age distribution
● Urban / rural age distribution differs
immensely
UK County = mostly countryside
UK Borough = mostly urban areas
● Detailed prevalence per age group
available for UK data
UK: correction for age
Directly age
adjusted
version ->
Median 30-95+
dementia count
per region
<-
UK County = mostly countryside
UK Borough = mostly urban areas
Geopandas: dementia and age correction for
London
^ Mean age by
London ward (2013)
^ Dementia prevalence by Westminster
constituency, 2013 (not age corrected)AD age
corrected ->
UK PM2.5 & age adjusted dementia
Cumulative non
human-made
PM2.5
Cumulative
human-made
PM2.5
UK mean
wind profile
1981- 2010
Age adjusted
dementia
prevalence
(England only)
Age adjusted English
dementia vs. PM 2.5
● Geospatially joined
○ (Local Authority ⇔ Parliamentary
Constituency)
● No clear correlation between
total age-adjusted dementia
prevalence (total for all ages)
and cumulative 5 year PM 2.5
Measuring lifetime exposure to PM2.5 ?
https://www.instituteforgovernment.org.uk/blog/dealing-diesel
● Ownership %
of diesel cars
increased
drastically
since 1990
Lifetime
exposure
https://www.instituteforgovernment.org.uk/blog/dealing-diesel
● Diesel cars only
part of the
picture
● Which age
group was most
affected by
traffic related
PM 2.5 levels?
Age-adjusted dementia prevalence in 30-39 year
olds
UK County = mostly countryside UK Borough = mostly urban areas
Age-adjusted
dementia prevalence
in 30-39 year olds
● Can we see evidence of
movement of young people out
from London to its surrounding
commuter belt?
Confounders and covariatesSome other possible variables that could explain
the relationship between poor AQ and early AD?
<- % not
born in the
UK (2011)
Median
household
income
(2012/13)
Confounders and covariatesSome other possible variables that could explain
the relationship between poor AQ and early AD?
Feature importance: Random Forest Regressor
model applied to London areas’ 30-39yrs AD
prevalence:
Education, income
levels and life
expectancy were
significant →
Our conclusions
● Some indication that there is a link between:
recent exposure to cumulative PM2.5 level per region
and
early Alzheimer's prevalence in that region
● This correlation may be confounded by other (unknown) factors
⇒ Open Data and Data Science ≠ Epidemiology
● Challenge to compare open data from different domains and geographies
⇒ Recommend Geopandas
Where should Data Scientists live ideally?
● A windy place… (or rainy)
● Not too much traffic…
● Wherever you live:
○ Head for greener areas
○ Get decent sleep, eat
healthily, get regular
aerobic exercise … and
intellectual stimulation.
THE
ENDThank you!
Frank: @norhustla, Esther: @esrem5, Ondrej: @ondrejiayc
HAL24K: @hal24k
We would welcome your involvement!
- Contribute AD and / or AQ data
- Examine the data and draw your own conclusions
https://github.com/ondrejiayc/PyDataLondon2017