w w w. n e s s t a r. c o m unlocking data – creating knowledge

Post on 19-Dec-2015

219 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

w w w . n e s s t a r . c o m

Unlocking data – creating knowledge

w w w . n e s s t a r . c o m

Data Publishing with Nesstar Publisher

Margaret Ward

Jostein Ryssevik

Cliff Dive

IASSIST/IFDO 2005

Edinburgh, Scotland

Workshop 5

w w w . n e s s t a r . c o m

Data publishing with Nesstar Publisher

This aim of this workshop is to provide an introduction to Nesstar Publisher

By the end of this session you will be able to:- prepare and publish micro-data (survey) files- prepare and publish a simple cube

w w w . n e s s t a r . c o m

Programme 1

• Overview of the Nesstar system & Publisher functionality

• Using Nesstar Publisher - Survey (micro) data

• Practical session 1

• Publishing - using Manage Server

• Templates

• Practical session 2

w w w . n e s s t a r . c o m

Programme 2

• Overview of the Hierarchical Publisher

• Using Nesstar Publisher - Cube (tabular) data

• Practical session 3

• Introduction to the Resource Publisher

• Overview of the new Publisher v3.5

• Questions

• Practical session 4

w w w . n e s s t a r . c o m

Publisher

End-user clients

RetrieveLoad and manage

Metadata and data input

ExtractMetadata and data editing/transformation Data and metadata

retrieval and display

Nesstar - an overview

Internet

w w w . n e s s t a r . c o m

Nesstar Publisher

• The Publisher is the ETL (Extract, Transform, Load) tool of the Nesstar product suite. It enables you to:– extract data and metadata from a variety of sources, systems and

formats,– clean, change, edit and extend the data and metadata, and– publish data and metadata to a Nesstar Server

• The Publisher can also be used to manage the content on a Nesstar server

• The Publisher can serve as general data/metadata entry, transformation and editing tool, independently of its role in the Nesstar system.

w w w . n e s s t a r . c o m

Nesstar Publisher cont.

• The Publisher supports:

– micro-data (e.g. survey-data)

– hierarchical data (e.g. household studies)

– aggregated data (multidimensional tables or cubes)

– additional information objects (e.g. reports, factsheets, pictures) to be stored on a Nesstar server

w w w . n e s s t a r . c o m

Hierarchical Publisher

Enables files to be linked together and analysed as one study

For example:• A study may contain household and individual level data files

which are linked by key variables

w w w . n e s s t a r . c o m

Cube Builder

The Cube Builder adds the following cube specific information:

• ‘Time’ and ‘Geographical’ dimensions

• The default view of the cube

• The additivity of the data

• The cube ‘measure(s)’

w w w . n e s s t a r . c o m

Resource publisher

• Used to publish ‘external’ Nesstar resources, e.g. PDF files, ‘Word’ files etc.

• Uses Dublin core or e-GMS for metadata

• Enables these ‘external’ resources to be viewable on a Nesstar Server alongside survey data and cubes

w w w . n e s s t a r . c o m

Using Nesstar Publisher

w w w . n e s s t a r . c o m

Preparing & publishing micro-data

DataNesstar

Publisher

Meta-data

Server

Hierarchical Publisher

NSD-statfile

w w w . n e s s t a r . c o m

Metadata

• What is metadata? Basically defined as ‘data about data’

• Aim of metadata is to make a resource ‘findable’ and ‘manageable’

• DDI: “Enables the effective, efficient and accurate use” of data resources.

w w w . n e s s t a r . c o m

Metadata standards supported by Nesstar Publisher

• DDI (http://www.icpsr.umich.edu/DDI)

“Enables the effective, efficient and accurate use” of data resources

• Dublin Core (http://uk.dublincore.org/)

“A standard for cross-domain information resource description”

• e-Government Metadata Standard (e-GMS) (http://www.govtalk.gov.uk/)

“To ensure maximum consistency of metadata across public sector organisations”

w w w . n e s s t a r . c o m

Adding metadata - Publisher templates

• Use metadata templates• Can use DDI or Dublin Core/e-GMS • Can add controlled vocabulary lists and default text• Can rename template fields, i.e. use familiar terms

Advantages:• Create to suit individual needs of an organisation or a data

series• Use of standard templates ensures consistent use of metadata

fields• Can add help information about each field to assist the data

publisher

w w w . n e s s t a r . c o m

Importing / Exporting of data

Formats for:Import: Export:• DDI document *.xml• SPSS: *.sav, *.por, *.sps *.sav, *.por, *.sps• SAS: *.sp1 *.sas (syntax)• Stata: *.dta (STATA 7 & STATA 8) *.dta• Statistica: *.sta *.sta• NSDstat: *.nsf *.nsf• dBase: *.dbf *.dbf• DIF: *.dif *.dif• Fixed Format ASCII *.dat• Delimited text: *.txt, *.csv *.txt• PC-Axis *.px

w w w . n e s s t a r . c o m

Import from DDI / Export DDI

Enables the re-use of metadata. Available options are:

• Import from Dataset: import the metadata from an existing NSDstat file

• Import from DDI: import an existing XML file

[Caution! ‘Invisible’ Metadata may be present]

• Export DDI: Export metadata to an XML file

w w w . n e s s t a r . c o m

Variable level metadata

• Variable and category labels can easily be edited/added• Able to change the case of variable/category labels• Variable repository makes re-use of category labels possible• Local and Global variable repositories - share information with

others • Add a map link to a variable• Adding question text and variable notes:

– to each variable separately– to a block of variables

• Identify ‘Weight’ variables• Identify ‘Time’ variables• Missing data assignments

w w w . n e s s t a r . c o m

Data manipulation functions

The Publisher enables you to:• View the data as a matrix allowing direct data entry or editing• Cut and paste data• Add, insert and copy variables of different types, e.g. numeric,

Fixed string, Dynamic string, Date• Insert/replace data – insert data matrix from dataset, or fixed

format text• Delete variables• Sort cases• Delete cases• Conversion between variable types

w w w . n e s s t a r . c o m

Variable Groups

• Useful for grouping variables that relate to the same topic or theme together

• Hierarchy of groups is supported• Variables can belong to more than one group• Groups can be arranged in any order• Information about that group can be added, e.g. a group

definition

Advantages:• Make it easier for end-users to navigate the dataset• Reduce the load time of a dataset when published

w w w . n e s s t a r . c o m

Using the Publisher

Demonstration

Practical session 1

w w w . n e s s t a r . c o m

Manage Server, Publishingand

Templates

w w w . n e s s t a r . c o m

Manage Server

• Provides the means to link to Nesstar Servers to enable publishing

• Enables the data publisher to manage the resources on a Nesstar server so that they can then:– Create new catalogues then name and describe them– Reorganise the catalogue hierarchy– Add files to a catalogue– Move files between catalogues– Delete files and catalogues

w w w . n e s s t a r . c o m

Publishing

• Add a Nesstar Server using the Server-URL, or locating the Server directory on a LAN, and entering an appropriate username and password

• Can publish to a Nesstar Server over a local area network (LAN) or over the Internet

• Able to publish to multiple Servers in a single operation• Options to publish ‘data and metadata’, ‘metadata only’ or

‘Republish’• Catalogues can be automatically selected if ‘Keywords’ or

‘Subject classification’ terms within the metadata match existing catalogue names

• Able to publish to a ‘Hidden’ catalogue – not visible to end-users• Able to view the published data directly from the Publisher –

‘Open in Web client’ option

w w w . n e s s t a r . c o m

Adding metadata - Publisher templates

• Use metadata templates• Can use DDI or Dublin Core/e-GMS • Can add controlled vocabulary lists and default text• Can rename template fields, i.e. use familiar terms

Advantages:• Create to suit individual needs of an organisation or a data

series• Use of standard templates ensures consistent use of metadata

fields• Can add help information about each field to assist the data

publisher

w w w . n e s s t a r . c o m

Manage Server, Publishing and

Templates

Demonstration

Practical session 2

w w w . n e s s t a r . c o m

The Hierarchical Publisher

w w w . n e s s t a r . c o m

Preparing & publishing hierarchical data

DataNesstar

Publisher

Meta-data

Server

Hierarchical Publisher

NSD-statfile

w w w . n e s s t a r . c o m

Hierarchical Publisher

• Used for datasets that are hierarchically related -

For example:

Household file

- Individual file

• Create NSDstat files using the main Publisher• Add Study metadata to one of the files• Within the Hierarchical Publisher identify the key variables (used

to link the files together)• Build the hierarchy of files• Validate the hierarchy• Publish

w w w . n e s s t a r . c o m

Introductionto

Nesstar Cubes

w w w . n e s s t a r . c o m

Cube agenda

• What is a cube?

• What is not a cube?

• How to use Nesstar Publisher and the Cube Builder to prepare a simple cube

w w w . n e s s t a r . c o m

What is a cube?

• A cube (or table) typically consists of aggregated data

• This data is defined by its ‘dimensions’ and ‘measures’

• ‘Dimension’ variables describe the data, e.g. gender, and consist of categories (male, female)

• ‘Measure’ variables represent the data, or values, found in the table cells

w w w . n e s s t a r . c o m

What is a cube? (2)

• Each cell in a table must be described by all dimensions

• A dimension can be hierarchical constructed

• Geographical dimensions can be linked to a map

w w w . n e s s t a r . c o m

Example 1 - A simple cube

(Population totals)

Year/GenderArea Male Female Male Female Male FemaleEast Anglia 335,000 320,000 460,000 415,000 395,000 370,000

Colchester 130,000 100,000 150,000 135,000 120,000 150,000

Chelmsford 155,000 145,000 225,000 200,000 200,000 140,000

Clacton 50,000 75,000 85,000 80,000 75,000 80,000

2002 2003 2004

w w w . n e s s t a r . c o m

Example 1 information

• Three dimensions: – Area (East Anglia, Colchester, Chelmsford, Clacton)

– Gender (Male, Female)

– Year (2002, 2003, 2004)

• The ‘Measure’ is the population figures

w w w . n e s s t a r . c o m

Hierarchical dimension (2 levels)

AREA

Regions: East Anglia Yorkshire South West Sussex

Towns: Colchester Clacton Chelmsford Leeds York Sheffield Plymouth Exeter Brighton Hove

w w w . n e s s t a r . c o m

Example 2a - Not a cube

2001 2002 2003 0-18 19-60 60+IpswichColchesterChelmsfordClacton

AreaYear Age

w w w . n e s s t a r . c o m

What is not a cube?

• How many dimensions does this cube have?

• Do all dimensions describe each data point, i.e. each cell in the table?

• What is its measure?

w w w . n e s s t a r . c o m

Example 2b - A cube

0-18 19-60 60+ 0-18 19-60 60+ 0-18 19-60 60+IpswichColchesterChelmsfordClacton

Year/Age/Area

2001 2002 2003

w w w . n e s s t a r . c o m

Preparing and publishing a cube

DataNesstarExporter

NesstarPublisher

Meta-data

ServerNesstarCube

Builder

NSD-statfile

XMLFile

NSD-stat Cube

File

w w w . n e s s t a r . c o m

Creating a cube - Nesstar Publisher

• Create a data file for input into the Nesstar Publisher (‘.csv’/‘.tab’ file)

• Using the Publisher - import the ‘.csv’/‘.tab’ file

• Add any metadata required, e.g. title, description

• Create the hierarchy for any hierarchical dimensions, e.g. Area

• Add a link to a map, if required

w w w . n e s s t a r . c o m

Example 3 - Life expectancy (non-additive)

(Age in years)

Year

Area

England 75 77 80

South East 77 79 81

Colchester 76 79 81

Chelmsford 75 78 74

Clacton 82 84 87

2002 2003 2004

w w w . n e s s t a r . c o m

Input file for the Publisher

• Input files can be comma separated (.csv) or tab delimited (.tab)

• Each row in the file must describe a cell in the table,

e.g. tab delimited:

England 2002 75

South East 2002 77

Colchester 2002 76

w w w . n e s s t a r . c o m

Example 3 - Input file

Area Year Age

England 2002 75

South East 2002 77

Colchester 2002 76

Chelmsford 2002 75

Clacton 2002 82

England 2003 77

South East 2003 79

Colchester 2003 79

Chelmsford 2003 78

Clacton 2003 84

England 2004 80

South East 2004 81

Colchester 2004 81

Chelmsford 2004 74

Clacton 2004 87

w w w . n e s s t a r . c o m

w w w . n e s s t a r . c o m

w w w . n e s s t a r . c o m

Creating a cube - Cube Builder

Use the Cube Builder to:

• Select the cube type, e.g. Non-additive, Stock-additive, Flow-additive

• Define the time and geographical dimensions

• Define the measure

• Create the default view

• Publish the cube to a Nesstar Server

w w w . n e s s t a r . c o m

Non-additive cubes

• No aggregation of the measure is possible across dimensions

• Data typically found in this type of cube are percentage figures, rates, life expectancy

w w w . n e s s t a r . c o m

Type of additive cube

• For additive cubes, aggregation of the data (measure values) is possible

• Stock: the measure represents a number at a point in time so no aggregation over time is possible. For example: yearly population figures, number of registered businesses

• Flow (fully additive): the data can be aggregated along all dimensions. For example: sales figures, number of reported crimes

w w w . n e s s t a r . c o m

Additive data

• For additive data, a higher-level category is automatically created containing the aggregated data from the lower levels

• No higher-level data should be included in the data file as these are calculated automatically

• This new category is called ALL – unless it is created within the Publisher, or was part of the original table

• For example: in the following cube, East Anglia = Colchester + Chelmsford + Clacton

w w w . n e s s t a r . c o m

Example 4 - Additive (stock)

(Population totals)

Year/GenderArea Male Female Male Female Male FemaleEast AngliaColchester 130,000 100,000 150,000 135,000 120,000 150,000

Chelmsford 155,000 145,000 225,000 200,000 200,000 140,000

Clacton 50,000 75,000 85,000 80,000 75,000 80,000

2002 2003 2004

w w w . n e s s t a r . c o m

Example 4 - Input file

Area Year Gender PopulationEast Anglia 2002 MaleColchester 2002 Male 130000Chelmsford 2002 Male 155000Clacton 2002 Male 50000East Anglia 2002 FemaleColchester 2002 Female 100000Chelmsford 2002 Female 145000Clacton 2002 Female 75000East Anglia 2003 MaleColchester 2003 Male 150000Chelmsford 2003 Male 225000Clacton 2003 Male 85000East Anglia 2003 FemaleColchester 2003 Female 135000Chelmsford 2003 Female 200000Clacton 2003 Female 80000East Anglia 2004 Male

w w w . n e s s t a r . c o m

Multiple measures

• Some cubes may contain a number of measures

• Following cube contains Population totals with relevant percentages. Both measures are non-additive

• Different measures in the same cube can be different types, e.g. one may be non-additive and the other additive.

w w w . n e s s t a r . c o m

Example 5 - Multiple measures

YearGenderArea / No. % No. % No. % No. % No. %East Anglia 335,000 51 320,000 49 460,000 53 415,000 47

Colchester 130,000 57 100,000 43 150,000 53 135,000 47

Chelmsford 155,000 52 145,000 48 225,000 52 200,000 48

Clacton 50,000 40 75,000 60 85,000 51 80,000 49

20032002

Male Female Male Female

w w w . n e s s t a r . c o m

Example 5 - Input file

Area Year Gender Number PercentageEast Anglia 2002 Male 335000 51Colchester 2002 Male 130000 57Chelmsford 2002 Male 155000 52Clacton 2002 Male 50000 40East Anglia 2002 Female 320000 49Colchester 2002 Female 100000 43Chelmsford 2002 Female 145000 48Clacton 2002 Female 75000 60East Anglia 2003 Male 460000 53Colchester 2003 Male 150000 53

w w w . n e s s t a r . c o m

Measure types

There are 5 possible measure types used in Nesstar:

• Average average of underlying values• Count number of underlying values• Minimum minimum of underlying values• Maximum maximum of underlying values• Sum total of underlying values

w w w . n e s s t a r . c o m

Examples of more complex tables

• What if I have several identical tables, that only differ in the year they refer to?– Combine them using YEAR as an additional dimension

• What if happens if I have several almost identical tables, but information for one category (e.g. ‘Male’) is missing for one year?– Combine the tables, and accept that there will be an empty

column for ‘Male’ for that year

w w w . n e s s t a r . c o m

Related tables

Population by area by year 2002

Area Male FemaleEast AngliaColchester 130,000 100,000

Chelmsford 155,000 145,000

Clacton 50,000 75,000

Population by area by year 2003

Area Male FemaleEast AngliaColchester 150,000 135,000

Chelmsford 225,000 200,000

Clacton 85,000 80,000

w w w . n e s s t a r . c o m

Combining tables

Year Area/Gender Male Female2002 East Anglia2002 Colchester 130,000 100,000

2002 Chelmsford 155,000 145,000

2002 Clacton 50,000 75,000

2003 East Anglia2003 Colchester 150,000 135,000

2003 Chelmsford 225,000 200,000

2003 Clacton 85,000 80,000

w w w . n e s s t a r . c o m

Preparing cubesSummary

• Once tables are combined they can be prepared in the usual way, e.g. create a comma separated (.csv) or tab delimited (.tab) file

• Import into the Publisher• Add metadata• Add any necessary information, e.g. level names, link to a map• Open the Cube Builder• Define type of cube, e.g. Non-additive• Create default view• Publish to a Nesstar Server

w w w . n e s s t a r . c o m

Publishing a simple cube

Demonstration

Practical session 3

w w w . n e s s t a r . c o m

Resource publisher

• Used to publish ‘external’ Nesstar resources, e.g. PDF files, ‘Word’ files etc.

• Uses Dublin core or e-GMS for metadata

• Enables these ‘external’ resources to be viewable on a Nesstar Server alongside survey data and cubes

w w w . n e s s t a r . c o m

Resource Publisher

Demonstration

top related