w w w. n e s s t a r. c o m unlocking data – creating knowledge

64
w w w . n e s s t a r . c o m Unlocking data – creating knowledge

Post on 19-Dec-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Unlocking data – creating knowledge

Page 2: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Data Publishing with Nesstar Publisher

Margaret Ward

Jostein Ryssevik

Cliff Dive

IASSIST/IFDO 2005

Edinburgh, Scotland

Workshop 5

Page 3: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Data publishing with Nesstar Publisher

This aim of this workshop is to provide an introduction to Nesstar Publisher

By the end of this session you will be able to:- prepare and publish micro-data (survey) files- prepare and publish a simple cube

Page 4: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Programme 1

• Overview of the Nesstar system & Publisher functionality

• Using Nesstar Publisher - Survey (micro) data

• Practical session 1

• Publishing - using Manage Server

• Templates

• Practical session 2

Page 5: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Programme 2

• Overview of the Hierarchical Publisher

• Using Nesstar Publisher - Cube (tabular) data

• Practical session 3

• Introduction to the Resource Publisher

• Overview of the new Publisher v3.5

• Questions

• Practical session 4

Page 6: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Publisher

End-user clients

RetrieveLoad and manage

Metadata and data input

ExtractMetadata and data editing/transformation Data and metadata

retrieval and display

Nesstar - an overview

Internet

Page 7: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Nesstar Publisher

• The Publisher is the ETL (Extract, Transform, Load) tool of the Nesstar product suite. It enables you to:– extract data and metadata from a variety of sources, systems and

formats,– clean, change, edit and extend the data and metadata, and– publish data and metadata to a Nesstar Server

• The Publisher can also be used to manage the content on a Nesstar server

• The Publisher can serve as general data/metadata entry, transformation and editing tool, independently of its role in the Nesstar system.

Page 8: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Nesstar Publisher cont.

• The Publisher supports:

– micro-data (e.g. survey-data)

– hierarchical data (e.g. household studies)

– aggregated data (multidimensional tables or cubes)

– additional information objects (e.g. reports, factsheets, pictures) to be stored on a Nesstar server

Page 9: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Hierarchical Publisher

Enables files to be linked together and analysed as one study

For example:• A study may contain household and individual level data files

which are linked by key variables

Page 10: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Cube Builder

The Cube Builder adds the following cube specific information:

• ‘Time’ and ‘Geographical’ dimensions

• The default view of the cube

• The additivity of the data

• The cube ‘measure(s)’

Page 11: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Resource publisher

• Used to publish ‘external’ Nesstar resources, e.g. PDF files, ‘Word’ files etc.

• Uses Dublin core or e-GMS for metadata

• Enables these ‘external’ resources to be viewable on a Nesstar Server alongside survey data and cubes

Page 12: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Using Nesstar Publisher

Page 13: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Preparing & publishing micro-data

DataNesstar

Publisher

Meta-data

Server

Hierarchical Publisher

NSD-statfile

Page 14: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Metadata

• What is metadata? Basically defined as ‘data about data’

• Aim of metadata is to make a resource ‘findable’ and ‘manageable’

• DDI: “Enables the effective, efficient and accurate use” of data resources.

Page 15: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Metadata standards supported by Nesstar Publisher

• DDI (http://www.icpsr.umich.edu/DDI)

“Enables the effective, efficient and accurate use” of data resources

• Dublin Core (http://uk.dublincore.org/)

“A standard for cross-domain information resource description”

• e-Government Metadata Standard (e-GMS) (http://www.govtalk.gov.uk/)

“To ensure maximum consistency of metadata across public sector organisations”

Page 16: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Adding metadata - Publisher templates

• Use metadata templates• Can use DDI or Dublin Core/e-GMS • Can add controlled vocabulary lists and default text• Can rename template fields, i.e. use familiar terms

Advantages:• Create to suit individual needs of an organisation or a data

series• Use of standard templates ensures consistent use of metadata

fields• Can add help information about each field to assist the data

publisher

Page 17: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Importing / Exporting of data

Formats for:Import: Export:• DDI document *.xml• SPSS: *.sav, *.por, *.sps *.sav, *.por, *.sps• SAS: *.sp1 *.sas (syntax)• Stata: *.dta (STATA 7 & STATA 8) *.dta• Statistica: *.sta *.sta• NSDstat: *.nsf *.nsf• dBase: *.dbf *.dbf• DIF: *.dif *.dif• Fixed Format ASCII *.dat• Delimited text: *.txt, *.csv *.txt• PC-Axis *.px

Page 18: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Import from DDI / Export DDI

Enables the re-use of metadata. Available options are:

• Import from Dataset: import the metadata from an existing NSDstat file

• Import from DDI: import an existing XML file

[Caution! ‘Invisible’ Metadata may be present]

• Export DDI: Export metadata to an XML file

Page 19: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Variable level metadata

• Variable and category labels can easily be edited/added• Able to change the case of variable/category labels• Variable repository makes re-use of category labels possible• Local and Global variable repositories - share information with

others • Add a map link to a variable• Adding question text and variable notes:

– to each variable separately– to a block of variables

• Identify ‘Weight’ variables• Identify ‘Time’ variables• Missing data assignments

Page 20: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Data manipulation functions

The Publisher enables you to:• View the data as a matrix allowing direct data entry or editing• Cut and paste data• Add, insert and copy variables of different types, e.g. numeric,

Fixed string, Dynamic string, Date• Insert/replace data – insert data matrix from dataset, or fixed

format text• Delete variables• Sort cases• Delete cases• Conversion between variable types

Page 21: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Variable Groups

• Useful for grouping variables that relate to the same topic or theme together

• Hierarchy of groups is supported• Variables can belong to more than one group• Groups can be arranged in any order• Information about that group can be added, e.g. a group

definition

Advantages:• Make it easier for end-users to navigate the dataset• Reduce the load time of a dataset when published

Page 22: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Using the Publisher

Demonstration

Practical session 1

Page 23: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Manage Server, Publishingand

Templates

Page 24: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Manage Server

• Provides the means to link to Nesstar Servers to enable publishing

• Enables the data publisher to manage the resources on a Nesstar server so that they can then:– Create new catalogues then name and describe them– Reorganise the catalogue hierarchy– Add files to a catalogue– Move files between catalogues– Delete files and catalogues

Page 25: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Publishing

• Add a Nesstar Server using the Server-URL, or locating the Server directory on a LAN, and entering an appropriate username and password

• Can publish to a Nesstar Server over a local area network (LAN) or over the Internet

• Able to publish to multiple Servers in a single operation• Options to publish ‘data and metadata’, ‘metadata only’ or

‘Republish’• Catalogues can be automatically selected if ‘Keywords’ or

‘Subject classification’ terms within the metadata match existing catalogue names

• Able to publish to a ‘Hidden’ catalogue – not visible to end-users• Able to view the published data directly from the Publisher –

‘Open in Web client’ option

Page 26: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Adding metadata - Publisher templates

• Use metadata templates• Can use DDI or Dublin Core/e-GMS • Can add controlled vocabulary lists and default text• Can rename template fields, i.e. use familiar terms

Advantages:• Create to suit individual needs of an organisation or a data

series• Use of standard templates ensures consistent use of metadata

fields• Can add help information about each field to assist the data

publisher

Page 27: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Manage Server, Publishing and

Templates

Demonstration

Practical session 2

Page 28: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

The Hierarchical Publisher

Page 29: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Preparing & publishing hierarchical data

DataNesstar

Publisher

Meta-data

Server

Hierarchical Publisher

NSD-statfile

Page 30: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Hierarchical Publisher

• Used for datasets that are hierarchically related -

For example:

Household file

- Individual file

• Create NSDstat files using the main Publisher• Add Study metadata to one of the files• Within the Hierarchical Publisher identify the key variables (used

to link the files together)• Build the hierarchy of files• Validate the hierarchy• Publish

Page 31: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Introductionto

Nesstar Cubes

Page 32: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Cube agenda

• What is a cube?

• What is not a cube?

• How to use Nesstar Publisher and the Cube Builder to prepare a simple cube

Page 33: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

What is a cube?

• A cube (or table) typically consists of aggregated data

• This data is defined by its ‘dimensions’ and ‘measures’

• ‘Dimension’ variables describe the data, e.g. gender, and consist of categories (male, female)

• ‘Measure’ variables represent the data, or values, found in the table cells

Page 34: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

What is a cube? (2)

• Each cell in a table must be described by all dimensions

• A dimension can be hierarchical constructed

• Geographical dimensions can be linked to a map

Page 35: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Example 1 - A simple cube

(Population totals)

Year/GenderArea Male Female Male Female Male FemaleEast Anglia 335,000 320,000 460,000 415,000 395,000 370,000

Colchester 130,000 100,000 150,000 135,000 120,000 150,000

Chelmsford 155,000 145,000 225,000 200,000 200,000 140,000

Clacton 50,000 75,000 85,000 80,000 75,000 80,000

2002 2003 2004

Page 36: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Example 1 information

• Three dimensions: – Area (East Anglia, Colchester, Chelmsford, Clacton)

– Gender (Male, Female)

– Year (2002, 2003, 2004)

• The ‘Measure’ is the population figures

Page 37: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Hierarchical dimension (2 levels)

AREA

Regions: East Anglia Yorkshire South West Sussex

Towns: Colchester Clacton Chelmsford Leeds York Sheffield Plymouth Exeter Brighton Hove

Page 38: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Example 2a - Not a cube

2001 2002 2003 0-18 19-60 60+IpswichColchesterChelmsfordClacton

AreaYear Age

Page 39: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

What is not a cube?

• How many dimensions does this cube have?

• Do all dimensions describe each data point, i.e. each cell in the table?

• What is its measure?

Page 40: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Example 2b - A cube

0-18 19-60 60+ 0-18 19-60 60+ 0-18 19-60 60+IpswichColchesterChelmsfordClacton

Year/Age/Area

2001 2002 2003

Page 41: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Preparing and publishing a cube

DataNesstarExporter

NesstarPublisher

Meta-data

ServerNesstarCube

Builder

NSD-statfile

XMLFile

NSD-stat Cube

File

Page 42: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Creating a cube - Nesstar Publisher

• Create a data file for input into the Nesstar Publisher (‘.csv’/‘.tab’ file)

• Using the Publisher - import the ‘.csv’/‘.tab’ file

• Add any metadata required, e.g. title, description

• Create the hierarchy for any hierarchical dimensions, e.g. Area

• Add a link to a map, if required

Page 43: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Example 3 - Life expectancy (non-additive)

(Age in years)

Year

Area

England 75 77 80

South East 77 79 81

Colchester 76 79 81

Chelmsford 75 78 74

Clacton 82 84 87

2002 2003 2004

Page 44: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Input file for the Publisher

• Input files can be comma separated (.csv) or tab delimited (.tab)

• Each row in the file must describe a cell in the table,

e.g. tab delimited:

England 2002 75

South East 2002 77

Colchester 2002 76

Page 45: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Example 3 - Input file

Area Year Age

England 2002 75

South East 2002 77

Colchester 2002 76

Chelmsford 2002 75

Clacton 2002 82

England 2003 77

South East 2003 79

Colchester 2003 79

Chelmsford 2003 78

Clacton 2003 84

England 2004 80

South East 2004 81

Colchester 2004 81

Chelmsford 2004 74

Clacton 2004 87

Page 46: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Page 47: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Page 48: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Creating a cube - Cube Builder

Use the Cube Builder to:

• Select the cube type, e.g. Non-additive, Stock-additive, Flow-additive

• Define the time and geographical dimensions

• Define the measure

• Create the default view

• Publish the cube to a Nesstar Server

Page 49: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Non-additive cubes

• No aggregation of the measure is possible across dimensions

• Data typically found in this type of cube are percentage figures, rates, life expectancy

Page 50: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Type of additive cube

• For additive cubes, aggregation of the data (measure values) is possible

• Stock: the measure represents a number at a point in time so no aggregation over time is possible. For example: yearly population figures, number of registered businesses

• Flow (fully additive): the data can be aggregated along all dimensions. For example: sales figures, number of reported crimes

Page 51: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Additive data

• For additive data, a higher-level category is automatically created containing the aggregated data from the lower levels

• No higher-level data should be included in the data file as these are calculated automatically

• This new category is called ALL – unless it is created within the Publisher, or was part of the original table

• For example: in the following cube, East Anglia = Colchester + Chelmsford + Clacton

Page 52: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Example 4 - Additive (stock)

(Population totals)

Year/GenderArea Male Female Male Female Male FemaleEast AngliaColchester 130,000 100,000 150,000 135,000 120,000 150,000

Chelmsford 155,000 145,000 225,000 200,000 200,000 140,000

Clacton 50,000 75,000 85,000 80,000 75,000 80,000

2002 2003 2004

Page 53: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Example 4 - Input file

Area Year Gender PopulationEast Anglia 2002 MaleColchester 2002 Male 130000Chelmsford 2002 Male 155000Clacton 2002 Male 50000East Anglia 2002 FemaleColchester 2002 Female 100000Chelmsford 2002 Female 145000Clacton 2002 Female 75000East Anglia 2003 MaleColchester 2003 Male 150000Chelmsford 2003 Male 225000Clacton 2003 Male 85000East Anglia 2003 FemaleColchester 2003 Female 135000Chelmsford 2003 Female 200000Clacton 2003 Female 80000East Anglia 2004 Male

Page 54: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Multiple measures

• Some cubes may contain a number of measures

• Following cube contains Population totals with relevant percentages. Both measures are non-additive

• Different measures in the same cube can be different types, e.g. one may be non-additive and the other additive.

Page 55: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Example 5 - Multiple measures

YearGenderArea / No. % No. % No. % No. % No. %East Anglia 335,000 51 320,000 49 460,000 53 415,000 47

Colchester 130,000 57 100,000 43 150,000 53 135,000 47

Chelmsford 155,000 52 145,000 48 225,000 52 200,000 48

Clacton 50,000 40 75,000 60 85,000 51 80,000 49

20032002

Male Female Male Female

Page 56: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Example 5 - Input file

Area Year Gender Number PercentageEast Anglia 2002 Male 335000 51Colchester 2002 Male 130000 57Chelmsford 2002 Male 155000 52Clacton 2002 Male 50000 40East Anglia 2002 Female 320000 49Colchester 2002 Female 100000 43Chelmsford 2002 Female 145000 48Clacton 2002 Female 75000 60East Anglia 2003 Male 460000 53Colchester 2003 Male 150000 53

Page 57: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Measure types

There are 5 possible measure types used in Nesstar:

• Average average of underlying values• Count number of underlying values• Minimum minimum of underlying values• Maximum maximum of underlying values• Sum total of underlying values

Page 58: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Examples of more complex tables

• What if I have several identical tables, that only differ in the year they refer to?– Combine them using YEAR as an additional dimension

• What if happens if I have several almost identical tables, but information for one category (e.g. ‘Male’) is missing for one year?– Combine the tables, and accept that there will be an empty

column for ‘Male’ for that year

Page 59: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Related tables

Population by area by year 2002

Area Male FemaleEast AngliaColchester 130,000 100,000

Chelmsford 155,000 145,000

Clacton 50,000 75,000

Population by area by year 2003

Area Male FemaleEast AngliaColchester 150,000 135,000

Chelmsford 225,000 200,000

Clacton 85,000 80,000

Page 60: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Combining tables

Year Area/Gender Male Female2002 East Anglia2002 Colchester 130,000 100,000

2002 Chelmsford 155,000 145,000

2002 Clacton 50,000 75,000

2003 East Anglia2003 Colchester 150,000 135,000

2003 Chelmsford 225,000 200,000

2003 Clacton 85,000 80,000

Page 61: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Preparing cubesSummary

• Once tables are combined they can be prepared in the usual way, e.g. create a comma separated (.csv) or tab delimited (.tab) file

• Import into the Publisher• Add metadata• Add any necessary information, e.g. level names, link to a map• Open the Cube Builder• Define type of cube, e.g. Non-additive• Create default view• Publish to a Nesstar Server

Page 62: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Publishing a simple cube

Demonstration

Practical session 3

Page 63: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Resource publisher

• Used to publish ‘external’ Nesstar resources, e.g. PDF files, ‘Word’ files etc.

• Uses Dublin core or e-GMS for metadata

• Enables these ‘external’ resources to be viewable on a Nesstar Server alongside survey data and cubes

Page 64: W w w. n e s s t a r. c o m Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Resource Publisher

Demonstration