structural analysis of the aggregate outputs from the 2011 census to develop alternative integrated...

31
Structural analysis of the aggregate outputs from the 2011 Census to develop alternative integrated multidimensional conceptual models of data and geographies for easier management and dissemination Justin Hayes UK Data Service What the census tells us workshop Manchester 23 July 2014

Upload: gerard-cole

Post on 14-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Structural analysis of the aggregate outputs from the 2011 Census to develop alternative integrated multidimensional conceptual models of data and geographies

Structural analysis of the aggregate outputs from the 2011 Census to develop alternative integrated multidimensional conceptual models of data and geographies for easier management and dissemination

Justin Hayes

UK Data Service

What the census tells us workshop

Manchester

23 July 2014

Page 2: Structural analysis of the aggregate outputs from the 2011 Census to develop alternative integrated multidimensional conceptual models of data and geographies

Making it easier for everyone to find, understand and use the bits of the census they’re interested in

Justin Hayes

UK Data Service

What the census tells us workshop

Manchester

23 July 2014

Page 3: Structural analysis of the aggregate outputs from the 2011 Census to develop alternative integrated multidimensional conceptual models of data and geographies

Overview

• Traditional and integrated approaches• Work with 2011 outputs

• Integrated descriptive model• Integrated model of geographies

• Ongoing work with data producers

Page 4: Structural analysis of the aggregate outputs from the 2011 Census to develop alternative integrated multidimensional conceptual models of data and geographies

Our job

• Find• Understand• Use• Automated systems with online interfaces• Online and interactive support• Main services now freely available to everyone

Page 5: Structural analysis of the aggregate outputs from the 2011 Census to develop alternative integrated multidimensional conceptual models of data and geographies

Traditional tabular aggregate outputs

• Outputs conceived and specified as tables• Details of individual tables defined through consultation with

different user groups• Per-table categorisations and descriptions• Complex table universes and footnotes• Visual layout an important consideration• Extended metadata unattached

• Complex process!• Number of tables limited by resource available

• Numerous inconsistencies between tables• Effectively separate datasets

Page 6: Structural analysis of the aggregate outputs from the 2011 Census to develop alternative integrated multidimensional conceptual models of data and geographies

Traditional tabular dissemination

Page 7: Structural analysis of the aggregate outputs from the 2011 Census to develop alternative integrated multidimensional conceptual models of data and geographies

Traditional tabular dissemination

Page 8: Structural analysis of the aggregate outputs from the 2011 Census to develop alternative integrated multidimensional conceptual models of data and geographies

Integrated aggregate outputs

• Deconstruct tables• Assemble and rationalise all variables and categories in tables• Variable-ise table universes and footnotes• Create a standardised library of variables to describe all data

• Define integrated models of characteristics (What?)and geographies (Where?)• Enables global operations/queries• Framework for Attachment of extended metadata• Facilitates description and transfer using standards

• Provide access via Web service API• Data becomes self-describing

Page 9: Structural analysis of the aggregate outputs from the 2011 Census to develop alternative integrated multidimensional conceptual models of data and geographies

Integrated dissemination

Page 10: Structural analysis of the aggregate outputs from the 2011 Census to develop alternative integrated multidimensional conceptual models of data and geographies

Variable combination selection

Page 11: Structural analysis of the aggregate outputs from the 2011 Census to develop alternative integrated multidimensional conceptual models of data and geographies

Variable combination selection

Page 12: Structural analysis of the aggregate outputs from the 2011 Census to develop alternative integrated multidimensional conceptual models of data and geographies

Category combination selection

Page 13: Structural analysis of the aggregate outputs from the 2011 Census to develop alternative integrated multidimensional conceptual models of data and geographies

Area selection

Page 14: Structural analysis of the aggregate outputs from the 2011 Census to develop alternative integrated multidimensional conceptual models of data and geographies

Data download

Page 15: Structural analysis of the aggregate outputs from the 2011 Census to develop alternative integrated multidimensional conceptual models of data and geographies

InFuse

Page 16: Structural analysis of the aggregate outputs from the 2011 Census to develop alternative integrated multidimensional conceptual models of data and geographies

Under the bonnet

• Integrated multidimensional descriptive model• Integrated model of geographies• The really important bits!

Page 17: Structural analysis of the aggregate outputs from the 2011 Census to develop alternative integrated multidimensional conceptual models of data and geographies

InFuse 2011 release 2: Raw data

• England and Wales Local and Detailed Characteristics to output area level

• UK harmonised data to local authority level• 422 tables, mainly multivariate• 31 geography types• 241,334 areas• 11,311 files• 15Gb volume

Page 18: Structural analysis of the aggregate outputs from the 2011 Census to develop alternative integrated multidimensional conceptual models of data and geographies

Integrated descriptive model

• Processing of raw metadata• Deconstruction, rationalisation and re-integration• Library of variables and categories• Re-insertion of data values• Attachment of associated metadata

• Global description using standards• Global operations via Web service API

• Data is self-describing• Enables lightweight, generic applications

Page 19: Structural analysis of the aggregate outputs from the 2011 Census to develop alternative integrated multidimensional conceptual models of data and geographies

Benefits of this work

• Data producers• Efficient data management• Flexible output production• Best value

• Application developers• Easy access to self describing web services• Light weight generic applications

• End users• Quick and easy global search• Context along with data

Page 20: Structural analysis of the aggregate outputs from the 2011 Census to develop alternative integrated multidimensional conceptual models of data and geographies

InFuse 2011 release 2: Processed data

• 97 variables• 2,501 categories• 281 variable combinations• 140 thousand category combinations• 4.6 billion values

• A 460Km high stack of sticky notes!• Anticipating approximately 10 billion values in all

Page 21: Structural analysis of the aggregate outputs from the 2011 Census to develop alternative integrated multidimensional conceptual models of data and geographies

Integrated model of UK census geographies

• Assembly of raw information on geographies• 31 geography types• 241,334 areas (anticipating ~ 2 million including postcodes)• Direct and indirect hierarchies

• Simplified presentational model• 11 composite geography layers• Simplification of merged geographies in England and Wales

• Calculation of ‘missing’ data• Linkage between descriptive and geography models

• Partial availability of data for geographies and extents

Page 22: Structural analysis of the aggregate outputs from the 2011 Census to develop alternative integrated multidimensional conceptual models of data and geographies

Raw admin and statistical geographies

Page 23: Structural analysis of the aggregate outputs from the 2011 Census to develop alternative integrated multidimensional conceptual models of data and geographies

Admin and statistical geography layers

infuse.mimas.ac.uk/help/definitions/2011geographies

Page 24: Structural analysis of the aggregate outputs from the 2011 Census to develop alternative integrated multidimensional conceptual models of data and geographies

What’s next for InFuse

• Interface improvements• Geography first option• Fine tune interface features• Select categories from more than one category combination• ‘Select all’ categories• Back button• Geography tree improvements (multiple hierarchies)

• User testing

Page 25: Structural analysis of the aggregate outputs from the 2011 Census to develop alternative integrated multidimensional conceptual models of data and geographies

What’s next?

• More data• More comparable data

• Different data• Boundary and flow data

• More functionality• Personalisation, analysis and visualisation

• Public InFuse API• Work with statistical agencies?

• Machine-friendly data from source• Flexible generation with automated disclosure control?• Information on usage and contact with users

Page 26: Structural analysis of the aggregate outputs from the 2011 Census to develop alternative integrated multidimensional conceptual models of data and geographies

What is the UK Data Service?

• a comprehensive resource funded

by the ESRC

• a single point of access to a wide range of secondary social science data

• support, training and guidance

Page 27: Structural analysis of the aggregate outputs from the 2011 Census to develop alternative integrated multidimensional conceptual models of data and geographies

UK Data Service Census Support

• Specialist function of UK Data Service

• Access and support services for outputs from recent UK censuses

• Add value by making census outputs easy to find, understand and use

• Engagement with UK census agencies

• Long history of technological innovation in service development

• census.ukdataservice.ac.uk

Page 28: Structural analysis of the aggregate outputs from the 2011 Census to develop alternative integrated multidimensional conceptual models of data and geographies

census.ukdataservice.ac.uk

Page 29: Structural analysis of the aggregate outputs from the 2011 Census to develop alternative integrated multidimensional conceptual models of data and geographies

• Aggregate component of census outputs

Census Support at Manchester

Justin Hayes

Rob Dymond-Green

Richard Wiseman

Jamey Hart

Page 30: Structural analysis of the aggregate outputs from the 2011 Census to develop alternative integrated multidimensional conceptual models of data and geographies

• Aggregate component of census outputs

Census Support at Manchester

Justin Hayes

Rob Dymond-Green

Richard Wiseman

Jamey Hart

Page 31: Structural analysis of the aggregate outputs from the 2011 Census to develop alternative integrated multidimensional conceptual models of data and geographies

Give InFuse a go!

infuse.mimas.ac.uk

•Comments, questions and ideas welcome•[email protected]