the impact of the data revolution on official statistics: opportunities, challenges and risks
TRANSCRIPT
The Impact of the Data Revolution on
Official Statistics:
Opportunities, Challenges and Risks
Prof. Rob Kitchin
NIRSA, Maynooth University
Background
• All-Island Research Observatory (AIRO; www.airo.ie)
• Dublin Dashboard (www.dublindashboard.ie)
• Digital Repository of Ireland (DRI; www.dri.ie)
• The Programmable City
The Data Revolution book
• A synoptic overview of big data, open data and data infrastructures
• An introduction to thinking conceptually about data, data infrastructures, data analytics and data markets
• A critical discussion of the technical issues and the social, political and ethical consequences of the data revolution
• An analysis of the implications of the data revolution to academic, business and government practices
The data revolution
• Data infrastructures
• Open and linked data
• Big data
• Data analytics
• Data markets
• Conceptualisation of data
• Disruptive innovations that offer opportunities, challenges and risks for government, business and academy
Data infrastructures
• Actively planned, curated and managed
• Enables storing, scaling, combining, sharing and consuming data
across networked archives and repositories
• Produces ‘data amplification’
• NSIs long and loosely operated as such (trusted) infrastructures,
but now organising into more coordinated platforms with:
• dedicated and integrated hardware and networked technologies;
interoperable software and middleware services and tools; shared
standards, protocols, metadata; shared services (relating to data
management and processing), analysis tools & policies (concerning
access, use, IPR, etc)
• Such infrastructures are being federated into larger pan-national
infrastructures (Eurostat, ESPON, UN, etc).
• Many other institutions catching up
Open and linked data
• Opening PSI (and other) data for re-use: driven by
transparency, participation, collaboration, economic
arguments
• Linking data/metadata using non-propriety formats and
URIs and RDF so that data can be referenced and conjoined
• NSIs already very active in this space; other government
data providers much further beyond
• More to be done, especially retro opening and linking
historical records; producing APIs; upgrading extent of
openness (licensing re. re-use, reworking, redistribution,
reselling); using non-proprietary formats; opening data
about the organizations themselves
Big data
Characteristic Small data Big data
Volume Limited to large Very large
Exhaustivity Samples Entire populations
Resolution and
indexicality
Coarse & weak to tight
& strong
Tight & strong
Relationality Weak to strong Strong
Velocity Slow, freeze-framed Fast
Variety Limited to wide Wide
Flexible and scalable Low to middling High
Big
data
and o
ffic
ial st
ati
stic
s (s
ourc
e E
SSC
2014)
Data analytics
• Challenge of making sense of big data is coping with its abundance and exhaustivity, timeliness and dynamism, messiness and uncertainty, semi-structured or unstructured nature
• Solution has been machine learning made possible by advances in computation and computational techniques
• Four broad classes of analytics: • data mining and pattern recognition
• statistical analysis
• prediction, simulation, and optimization
• data visualization and visual analytics
Conceptualising data
• Technically and methodologically: data generation, handling, processing, storing, analyzing, sharing, etc.
• Philosophically: ontology, epistemology, ideology • what can we know about the world, how can we know it, what do should we
do with such knowledge
• Critical data studies • rather than understanding data as objective, neutral, pre-analytic &
commonsensical, data are understood as being framed socially, political, ethically, philosophically in terms of their form, selection, analysis and deployment
• data do not exist independently of the ideas, instruments, practices, contexts, knowledges and systems used to generate, process and analyze them
• data express a normative notion about what should be measured, for what reasons, and what they should tell us; they have normative effects; they do not simply reflect the world but actively produce it
• data are framed by and situated within data assemblages – NSI constitute such assemblages
Data assemblage
Attributes Elements
Systems of thought Modes of thinking, philosophies, theories, models, ideologies, rationalities,
etc.
Forms of
knowledge
Research texts, manuals, magazines, websites, experience, word of mouth,
chat forums, etc.
Finance Business models, investment, venture capital, grants, philanthropy, profit,
etc.
Political economy Policy, tax regimes, public and political opinion, ethical considerations, etc.
Governmentalities /
Legalities
Data standards, file formats, system requirements, protocols, regulations,
laws, licensing, intellectual property regimes, etc.
Materialities &
infrastructures
Paper/pens, computers, digital devices, sensors, scanners, databases,
networks, servers, etc.
Practices Techniques, ways of doing, learned behaviours, scientific conventions, etc.
Organisations &
institutions
Archives, corporations, consultants, manufacturers, retailers, government
agencies, universities, conferences, clubs and societies, committees and
boards, communities of practice, etc.
Subjectivities &
communities
Of data producers, curators, managers, analysts, scientists, politicians, users,
citizens, etc.
Places Labs, offices, field sites, data centres, server farms, business parks, etc, and
their agglomerations
Marketplace For data, its derivatives (e.g., text, tables, graphs, maps), analysts, analytic
software, interpretations, etc.
Implications and uses of data
• Scaled, open, linked, big data and associated analytics produces
knowledge that enhances governing of people, managing
organisations, leveraging value and producing capital, creating
better places, improving health and well-being, tackling social
and ecological issues, fostering civic participation, etc.
• They improve insight and wisdom, productivity, competitiveness,
efficiency, effectiveness, utility, sustainability, safety & security,
transparency ...
• Challenge established epistemologies in the academy
• “Revolutions in science have often been preceded by revolutions in
measurement” Sinan Aral
• new empiricism, data-driven science, computational social sciences,
digital humanities
• transforming how we frame, ask and answer questions
Opportunities for OS/NSIs
• New sources of dynamic and linked data and more timely outputs
• Complement/replace/improve/add to existing data/approaches
• New forms of data analytics can provide greater insights from existing and new datasets
• Optimize working practices, gain efficiencies, redeploy staff
• Stronger links/partnerships with computational social science, data science (esp. viz), and data industries
• Drive creation of data-driven institutions and evidence-informed governance
• Greater visibility and use of products
Challenges for OS/NSIs
• Sourcing data from third parties and associated partnering,
legal and financial issues, including opening OSs derived
from private data
• Experimenting and trialing to determine:
• suitability for official statistics, esp. when data being repurposed, is
not representatively sampled, and is flexible thus potentially
altering continuity, and has undefined data quality (re. veracity
(accuracy, fidelity), uncertainty, error, bias, reliability, calibration)
• technological feasibility re. transferring, storing, cleaning,
checking, and linking big data
• methodological feasibility re. augmenting/producing OSs.
Challenges for OS/NSIs
• Building and maintaining new IT infrastructure, retro
work on older data (opening, linking); ensuring
security/data protection, deploying new data analytics
• Sourcing additional resourcing (financial and staffing)
for dealing with new data streams and opening/linking
data
• Developing new technical and methodological skills
and sourcing/retaining trained/skilled staff
• Establishing standards, standardization,
interoperability across jurisdictions
Risks for OS/NSIs
• Undermining of reputation and trust • quantity and utility of data opened (moving beyond low-hanging fruit)
• quality of data (big data often messy & dirty) and losing control of generation/sampling/processing
• established statistical products become undermined or discontinued before alternatives fully established/verified
• partnering with third parties (tarnished by their reputation)
• public perception and resistance to use of big data
• Privacy and security
• Access and continuity (will private sources of data be available over long term; will flexibility alter/break time-series); resistance from third parties to sharing data (gratis);
• Fragmented landscape across jurisdictions
• Pressure to reduce staff/budget rather than redeploy
• Competition and privatisation (data brokers)
Solutions
• Need:
• conceptual, practical and strategic thought re. challenges and risks
of building data infrastructures, opening data, using big data
• planning of change management from short to long-term
• coordinated response re. experimentation, processes, trialing,
standards, IPR, legislation, software, building infrastructure, etc. to
establish best practice and ensure continuity across jurisdictions
• coordinated political lobbying re. resourcing
• Alliances and sharing information with similar organisations (e.g.,
RDA, WDS)
• Some of this already happening. More needed in a fast
moving space.
Conclusion
• A data revolution is underway
• a fundamental shift in data openness and sharing,
• volume, exhaustiveness, timeliness, granularity, relationality,
variety, analytics, technical infrastructures, etc.
• conceptual thought relating to data
• Creating a set of disruptive innovations that is producing
opportunities, challenges and risks for NSIs and others
• It is important for NSIs to get ahead of the curve with
respect to challenges and risks, becoming proactive not
reactive and setting the agenda for new innovations
• This requires conceptual, practical and strategic thought
and a coordinated approach across institutions
[email protected] @robkitchin
Kitchin, R., Lauriault, T. and McArdle, G. (2015) Knowing and governing cities through urban indicators, city benchmarking and real-time dashboards. Regional Studies, Regional Science 2: 1-28
Kitchin, R. and Lauriault, T. (2014) Small data in the era of big data. GeoJournal online first
Kitchin, R. (2014) Big data, new epistemologies and paradigm shifts. Big Data and Society 1 (April-June): 1-12.
Kitchin, R. and Lauriault, T. (2014) Towards critical data studies: Charting and unpacking data assemblages and their work. The Programmable City Working Paper 2, SSRN
Kitchin, R. (2013) Big data and human geography: Opportunities, challenges and risks. Dialogues in Human Geography 3(3): 262–267
http://www.nuim.ie/progcity
@progcity